microsoft sql server always on solutions guide for high availability and disaster recovery
DESCRIPTION
Microsoft SQL Server Always on Solutions Guide for High Availability and Disaster RecoveryTRANSCRIPT
-
Microsoft SQL ServerAlwaysOn Solutions Guide for High
Availability and Disaster RecoveryLeRoy Tuttle, Jr.
Quick Guide
Microsoft
-
MicSolAvaLeRoy
ContribMishra Review(SQLHAMattheThoma
SummamaximizAlwaysO
A key gobetweeninfrastru
CategorApplies Source: E-book 32 page
crosoutionailaby Tuttle,
butors: Li
wers: KeviA), Alexei ews, AyadSs, Benjam
ry: This whze applicatioOn high ava
oal of this pn business sucture engin
ry: Quick Gto: SQL SeWhite pappublicatios
oft SQns Guility a, Jr.
indsey All
n Farlee, SKhalyako,Shammou
min Wright
ite paper don availabililability and
paper is to estakeholderneers, and d
uide erver 2012 er (link to s
on date: Ma
QL Seuide and
en, Justin
Shahryar G, Wolfganut (Caregrt-Jones
iscusses hoity, and pro
d disaster re
establish a rs, technicadatabase ad
source contay 2012
erverfor HDisas
Erickson,
G. Hashemg Kutsche
roup), Dav
ow to reducovide data pecovery sol
common col decision mdministrato
ent)
r AlwHigh ster
Min He, C
mi (Motricera (Bwin vid P. Smit
ce planned protection utions.
ontext for rmakers, systors.
waysO
Reco
Cephas Li
city), AllanParty), Chth (Service
and unplanusing SQL S
related disctem archite
On
overy
n, Sanjay
n Hirt harles eU), Juerg
nned downtServer 2012
ussions ects,
y
gen
time, 2
Microsoft'
-
This page intentionally left blank
-
Copyright 2012 by Microsoft CorporationAll rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means without the written permission of the publisher.
Microsoft and the trademarks listed at http://www.microsoft.com/about/legal/en/us/IntellectualProperty/Trademarks/EN-US.aspx are trademarks of the Microsoft group of companies. All other marks are property of their respective owners. The example companies, organizations, products, domain names, email addresses, logos, people, places, and events depicted herein are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, place, or event is intended or should be inferred. This book expresses the authors views and opinions. The information contained in this book is provided without any express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers, or distributors will be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery iv
ContentsHighAvailabilityandDisasterRecoveryConcepts.........................................................................1DescribingHighAvailability................................................................................................................................................1
Plannedvs.UnplannedDowntime..........................................................................................................................................1DegradedAvailability..............................................................................................................................................................2
QuantifyingDowntime.........................................................................................................................................................2RecoveryObjectives................................................................................................................................................................3JustifyingROIorOpportunityCost..........................................................................................................................................3MonitoringAvailabilityHealth................................................................................................................................................4PlanningforDisasterRecovery...............................................................................................................................................4
Overview:HighAvailabilitywithMicrosoftSQLServer2012..................................................................................5SQLServerAlwaysOn..............................................................................................................................................................5SignificantlyReducePlannedDowntime.................................................................................................................................5EliminateIdleHardwareandImproveCostEfficiencyandPerformance................................................................................6EasyDeploymentandManagement.......................................................................................................................................6ContrastingRPOandRTOCapabilities....................................................................................................................................6
SQLServerAlwaysOnLayersofProtection..........................................................................................7InfrastructureAvailability...................................................................................................................................................8
WindowsOperatingSystem....................................................................................................................................................8WindowsServerFailoverClustering.......................................................................................................................................9WSFCClusterValidationWizard...........................................................................................................................................11WSFCQuorumModesandVotingConfiguration..................................................................................................................12WSFCDisasterRecoverythroughForcedQuorum................................................................................................................15
SQLServerInstanceLevelProtection...........................................................................................................................17AvailabilityImprovementsSQLServerInstances...............................................................................................................17AlwaysOnFailoverClusterInstances.....................................................................................................................................18
DatabaseAvailability..........................................................................................................................................................21AlwaysOnAvailabilityGroups...............................................................................................................................................21AvailabilityGroupFailover....................................................................................................................................................22AvailabilityGroupListener....................................................................................................................................................24AvailabilityImprovementsDatabases................................................................................................................................26
ClientConnectivityRecommendations........................................................................................................................27Conclusion..............................................................................................................................................................................28
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 1
HighAvailabilityandDisasterRecoveryConceptsYoucanmakethebestselectionofadatabasetechnologyforahighavailabilityanddisasterrecoverysolutionwhenallstakeholdershaveasharedunderstandingoftherelatedbusinessdrivers,challenges,andobjectivesofplanning,managing,andmeasuringRTOandRPOobjectives.ReaderswhoarefamiliarwiththeseconceptscanmoveaheadtotheOverview:HighAvailabilitywithMicrosoftSQLServer2012sectionofthispaper.DescribingHighAvailabilityForagivensoftwareapplicationorservice,highavailabilityisultimatelymeasuredintermsoftheendusersexperienceandexpectations.Thetangibleandperceivedbusinessimpactofdowntimemaybeexpressedintermsofinformationloss,propertydamage,decreasedproductivity,opportunitycosts,contractualdamages,orthelossofgoodwill.Theprincipalgoalofahighavailabilitysolutionistominimizeormitigatetheimpactofdowntime.AsoundstrategyforthisoptimallybalancesbusinessprocessesandServiceLevelAgreements(SLAs)withtechnicalcapabilitiesandinfrastructurecosts.Aplatformisconsideredhighlyavailablepertheagreementandexpectationsofcustomersandstakeholders.Theavailabilityofasystemcanbeexpressedasthiscalculation:
100%
Theresultingvalueisoftenexpressedbyindustryintermsofthenumberof9sthatthesolutionprovides;meanttoconveyanannualnumberofminutesofpossibleuptime,orconversely,minutesofdowntime.
Numberof9s AvailabilityPercentage TotalAnnualDowntime2 99% 3days,15hours3 99.9% 8hours,45minutes4 99.99% 52minutes,34seconds5 99.999% 5minutes,15seconds
Plannedvs.UnplannedDowntimeSystemoutagesareeitheranticipatedandplannedfor,ortheyaretheresultofanunplannedfailure.Downtimeneednotbeconsiderednegativelyifitisappropriatelymanaged.Therearetwokeytypesofforeseeabledowntime: Plannedmaintenance.Atimewindowispreannouncedandcoordinatedforplannedmaintenance
taskssuchassoftwarepatching,hardwareupgrades,passwordupdates,offlinereindexing,dataloading,ortherehearsalofdisasterrecoveryprocedures.Deliberate,wellmanagedoperationalproceduresshouldminimizedowntimeandpreventanydataloss.Plannedmaintenanceactivities
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 2
canbeseenasinvestmentsneededtopreventormitigateotherpotentiallymoresevereunplannedoutagescenarios.
Unplannedoutage.Systemlevel,infrastructure,orprocessfailuresmayoccurthatareunplannedoruncontrollable,orthatareforeseeable,butconsideredeithertoounlikelytooccur,orareconsideredtohaveanacceptableimpact.Arobusthighavailabilitysolutiondetectsthesetypesoffailures,automaticallyrecoversfromtheoutage,andthenreestablishesfaulttolerance.
WhenestablishingSLAsforhighavailability,youshouldcalculateseparatekeyperformanceindicators(KPIs)forplannedmaintenanceactivitiesandunplanneddowntime.Thisapproachallowsyoutocontrastyourinvestmentinplannedmaintenanceactivitiesagainstthebenefitofavoidingunplanneddowntime.DegradedAvailabilityHighavailabilityshouldnotbeconsideredasanallornothingproposition.Asanalternativetoacompleteoutage,itisoftenacceptabletotheenduserforasystemtobepartiallyavailable,ortohavelimitedfunctionalityordegradedperformance.Thesevaryingdegreesofavailabilityinclude: Readonlyanddeferredoperations.Duringamaintenancewindow,orduringaphaseddisaster
recovery,dataretrievalisstillpossible,butnewworkflowsandbackgroundprocessingmaybetemporarilyhaltedorqueued.
Datalatencyandapplicationresponsiveness.Duetoaheavyworkload,aprocessingbacklog,orapartialplatformfailure,limitedhardwareresourcesmaybeovercommittedorundersized.Userexperiencemaysuffer,butworkmaystillgetdoneinalessproductivemanner.
Partial,transient,orimpendingfailures.Robustnessintheapplicationlogicorhardwarestackthatretriesorselfcorrectsuponencounteringanerror.Thesetypesofissuesmayappeartotheenduserasdatalatencyorpoorapplicationresponsiveness.
Partialendtoendfailure.Plannedorunplannedoutagesmayoccurgracefullywithinverticallayersofthesolutionstack(infrastructure,platform,andapplication),orhorizontallybetweendifferentfunctionalcomponents.Usersmayexperiencepartialsuccessordegradation,dependinguponthefeaturesorcomponentsthatareaffected.
Theacceptabilityofthesesuboptimalscenariosshouldbeconsideredaspartofaspectrumofdegradedavailabilityleadinguptoacompleteoutage,andasintermediatestepsinaphaseddisasterrecovery.QuantifyingDowntimeWhendowntimedoesoccur,eitherplanned,orunplanned,theprimarybusinessgoalistobringthesystembackonlineandminimizedataloss.Everyminuteofdowntimehasdirectandindirectcosts.Withunplanneddowntime,youmustbalancethetimeandeffortneededtodeterminewhytheoutageoccurred,whatthecurrentsystemstateis,andwhatstepsareneededtorecoverfromtheoutage.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 3
Atapredeterminedpointinanyoutage,youshouldmakeorseekthebusinessdecisiontostopinvestigatingtheoutageorperformingmaintenancetasks,recoverfromtheoutagebybringingthesystembackonline,andifneeded,reestablishfaulttolerance.RecoveryObjectivesDataredundancyisakeycomponentofahighavailabilitydatabasesolution.TransactionalactivityonyourprimarySQLServerinstanceissynchronouslyorasynchronouslyappliedtooneormoresecondaryinstances.Whenanoutageoccurs,transactionsthatwereinflightmayberolledback,ortheymaybelostonthesecondaryinstancesduetodelaysindatapropagation.Youcanbothmeasuretheimpact,andsetrecoverygoalsintermshowlongittakestogetbackinbusiness,andhowmuchtimelatencythereisinthelasttransactionrecovered: RecoveryTimeObjective(RTO).Thisisthedurationoftheoutage.Theinitialgoalistogetthe
systembackonlineinatleastareadonlycapacitytofacilitateinvestigationofthefailure.However,theprimarygoalistorestorefullservicetothepointthatnewtransactionscantakeplace.
RecoveryPointObjective(RPO).Thisisoftenreferredtoasameasureofacceptabledataloss.Itisthetimegaporlatencybetweenthelastcommitteddatatransactionbeforethefailureandthemostrecentdatarecoveredafterthefailure.Theactualdatalosscanvarydependingupontheworkloadonthesystematthetimeofthefailure,thetypeoffailure,andthetypeofhighavailabilitysolutionused.
YoushoulduseRTOandRPOvaluesasgoalsthatindicatebusinesstolerancefordowntimeandacceptabledataloss,andasmetricsformonitoringavailabilityhealth.JustifyingROIorOpportunityCostThebusinesscostsofdowntimemaybeeitherfinancialorintheformofcustomergoodwill.Thesecostsmayaccruewithtime,ortheymaybeincurredatacertainpointintheoutagewindow.Inadditiontoprojectingthecostofincurringanoutagewithagivenrecoverytimeanddatarecoverypoint,youcanalsocalculatethebusinessprocessandinfrastructureinvestmentsneededtoattainyourRTOandRPOgoalsortoavoidtheoutagealltogether.Theseinvestmentthemesshouldinclude: Avoidingdowntime.Outagerecoverycostsareavoidedalltogetherifanoutagedoesntoccurinthe
firstplace.Investmentsincludethecostoffaulttolerantandredundanthardwareorinfrastructure,distributingworkloadsacrossisolatedpointsoffailure,andplanneddowntimeforpreventivemaintenance.
Automatingrecovery.Ifasystemfailureoccurs,youcangreatlymitigatetheimpactofdowntimeonthecustomerexperiencethroughautomaticandtransparentrecovery.
Resourceutilization.Secondaryorstandbyinfrastructurecansitidle,awaitinganoutage.Italsocanbeleveragedforreadonlyworkloads,ortoimproveoverallsystemperformancebydistributingworkloadsacrossallavailablehardware.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 4
ForgivenRTOandRPOgoals,theneededavailabilityandrecoveryinvestments,combinedwiththeprojectedcostsofdowntime,canbeexpressedandjustifiedasafunctionoftime.Duringanactualoutage,thisallowsyoutomakecostbaseddecisionsbasedontheelapseddowntime.MonitoringAvailabilityHealthFromanoperationalpointofview,duringanactualoutage,youshouldnotattempttoconsiderallrelevantvariablesandcalculateROIoropportunitycostsinrealtime.Instead,youshouldmonitordatalatencyonyourstandbyinstancesasaproxyforexpectedRPO.Intheeventofanoutage,youshouldalsolimittheinitialtimespentinvestigatingtherootcauseduringtheoutage,andinsteadfocusonvalidatingthehealthofyourrecoveryenvironment,andthenrelyupondetailedsystemlogsandsecondarycopiesofdataforsubsequentforensicanalysis.PlanningforDisasterRecoveryWhilehighavailabilityeffortsentailwhatyoudotopreventanoutage,disasterrecoveryeffortsaddresswhatisdonetoreestablishhighavailabilityaftertheoutage.Asmuchaspossible,disasterrecoveryproceduresandresponsibilitiesshouldbeformulatedbeforeanactualoutageoccurs.Baseduponactivemonitoringandalerts,thedecisiontoinitiateanautomatedormanualfailoverandrecoveryplanshouldbetiedtopreestablishedRTOandRPOthresholds.Thescopeofasounddisasterrecoveryplanshouldinclude: Granularityoffailureandrecovery.Dependinguponthelocationandtypeoffailure,youcantake
correctiveactionatdifferentlevels;thatis,datacenter,infrastructure,platform,application,orworkload.
Investigativesourcematerial.Baselineandrecentmonitoringhistory,systemalerts,eventlogs,anddiagnosticqueriesshouldallbereadilyaccessiblebyappropriateparties.
Coordinationofdependencies.Withintheapplicationstack,andacrossstakeholders,whatarethesystemandbusinessdependencies?
Decisiontree.Apredetermined,repeatable,validateddecisiontreethatincludesroleresponsibilities,faulttriage,failovercriteriaintermsofRPOandRTOgoals,andprescribedrecoverysteps.
Validation.Aftertakingstepstorecoverfromtheoutage,whatmustbedonetoverifythatthesystemhasreturnedtonormaloperations?
Documentation.Capturealloftheaboveitemsinasetofdocumentation,withsufficientdetailandclaritysothatathirdpartyteamcanexecutetherecoveryplanwithminimalassistance.Thistypeofdocumentationiscommonlyreferredasarunbookoracookbook.
Recoveryrehearsals.RegularlyexercisethedisasterrecoveryplantoestablishbaselineexpectationsforRTOgoals,andconsiderregularrotationofhostingtheprimaryproductionsiteontheprimaryandeachofthedisasterrecoverysites.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 5
Overview:HighAvailabilitywithMicrosoftSQLServer2012AchievingtherequiredRPOandRTOgoalsinvolvesensuringcontinuousuptimeofcriticalapplicationsandprotectionofcriticaldatafromunplannedandplanneddowntime.SQLServerprovidesasetoffeaturesandcapabilitiesthatcanhelpachievethosegoalswhilekeepingthecostandcomplexitylow.ReaderswhohaveahighlevelfamiliaritywiththenewAlwaysOncapabilitiescanmoveaheadtothedeepercoverageintheSQLServerAlwaysOnLayersofProtectionsectionofthispaper.SQLServerAlwaysOnAlwaysOnisanewintegrated,flexible,costefficienthighavailabilityanddisasterrecoverysolution.Itcanprovidedataandhardwareredundancywithinandacrossdatacenters,andimprovesapplicationfailovertimetoincreasetheavailabilityofyourmissioncriticalapplications.AlwaysOnprovidesflexibilityinconfigurationandenablesreuseofexistinghardwareinvestments.AnAlwaysOnsolutioncanleveragetwomajorSQLServer2012featuresforconfiguringavailabilityatboththedatabaseandtheinstancelevel: AlwaysOnAvailabilityGroups,newinSQLServer2012,greatlyenhancethecapabilitiesofdatabase
mirroringandhelpsensureavailabilityofapplicationdatabases,andtheyenablezerodatalossthroughlogbaseddatamovementfordataprotectionwithoutshareddisks.Availabilitygroupsprovideanintegratedsetofoptionsincludingautomaticandmanualfailoverofalogicalgroupofdatabases,supportforuptofoursecondaryreplicas,fastapplicationfailover,andautomaticpagerepair.
AlwaysOnFailoverClusterInstances(FCIs)enhancetheSQLServerfailoverclusteringfeatureandsupportmultisiteclusteringacrosssubnets,whichenablescrossdatacenterfailoverofSQLServerinstances.Fasterandmorepredictableinstancefailoverisanotherkeybenefitthatenablesfasterapplicationrecovery.
SignificantlyReducePlannedDowntimeThekeyreasonforapplicationdowntimeinanyorganizationisplanneddowntimecausedbyoperatingsystempatching,hardwaremaintenance,andsoon.Thiscanconstitutealmost80percentoftheoutagesinanITenvironment.SQLServer2012helpsreduceplanneddowntimesignificantlybyreducingpatchingrequirementsandenablingmoreonlinemaintenanceoperations: WindowsServerCore.SQLServer2012supportsdeploymentsonWindowsServerCore,aminimal,
streamlineddeploymentoptionforWindowsServer2008andWindowsServer2008R2.Thisoperatingsystemconfigurationcanreduceplanneddowntimebyminimizingoperatingsystempatchingrequirementsbyasmuchas60percent.
OnlineOperations.EnhancedsupportforonlineoperationslikeLOBreindexingandaddingcolumnswithdefaultvalueshelpstoreducedowntimeduringdatabasemaintenanceoperations.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 6
RollingUpgradeandPatching.AlwaysOnfeaturesfacilitaterollingupgradesandpatchingofinstances,whichhelpssignificantlytoreduceapplicationdowntime.
SQLServeronHyperV.SQLServerinstanceshostedintheHyperVenvironmentreceivetheadditionalbenefitofLiveMigration,whichenablesyoutomigratevirtualmachinesbetweenhostswithzerodowntime.Administratorscanperformmaintenanceoperationsonthehostwithoutimpactingapplications.
EliminateIdleHardwareandImproveCostEfficiencyandPerformanceTypicalhighavailabilitysolutionsinvolvedeploymentofcostly,redundant,passiveservers.AlwaysOnAvailabilityGroupsenableyoutoutilizesecondarydatabasereplicasonotherwisepassiveoridleserversforreadonlyworkloadssuchasSQLServerReportingServicesreportqueriesorbackupoperations.Theabilitytosimultaneouslyutilizeboththeprimaryandsecondarydatabasereplicashelpsimproveperformanceofallworkloadsduetobetterresourcebalancingacrossyourserverhardwareinvestments.EasyDeploymentandManagementFeaturessuchastheConfigurationWizard,supportfortheWindowsPowerShellcommandlineinterface,dashboards,dynamicmanagementviews(DMVs),policybasedmanagement,andSystemCenterintegrationhelpsimplifydeploymentandmanagementofavailabilitygroups.ContrastingRPOandRTOCapabilitiesThebusinessgoalsforRecoveryPointObjective(RPO)andRecoveryTimeObjective(RTO)shouldbekeydriversinselectingaSQLServertechnologyforyourhighavailabilityanddisasterrecoverysolution.Thistableoffersaroughcomparisonofthetypeofresultsthatthosedifferentsolutionsmayachieve:
HighAvailabilityandDisasterRecoverySQLServerSolution
PotentialDataLoss(RPO)
PotentialRecoveryTime(RTO)
AutomaticFailover
ReadableSecondaries(1)
AlwaysOnAvailabilityGroupsynchronouscommit
Zero Seconds Yes(4) 02
AlwaysOnAvailabilityGroupasynchronouscommit
Seconds Minutes No 04
AlwaysOnFailoverClusterInstance NA(5) Secondstominutes
Yes NA
DatabaseMirroring(2)Highsafety(sync+witness)
Zero Seconds Yes NA
DatabaseMirroring(2)Highperformance(async)
Seconds(6) Minutes(6) No NA
LogShipping Minutes(6) Minutestohours(6)
No Notduringarestore
Backup,Copy,Restore(3) Hours(6) Hourstodays(6)
No Notduringarestore
(1)AnAlwaysOnAvailabilityGroupcanhavenomorethanatotaloffoursecondaryreplicas,regardlessoftype.(2)ThisfeaturewillberemovedinafutureversionofMicrosoftSQLServer.UseAlwaysOnAvailabilityGroupsinstead.(3)Backup,Copy,Restoreisappropriatefordisasterrecovery,butnotforhighavailability.(4)Automaticfailoverofanavailabilitygroupisnotsupportedtoorfromafailoverclusterinstance.(5)TheFCIitselfdoesntprovidedataprotection;datalossisdependentuponthestoragesystemimplementation.(6)Highlydependentupontheworkload,datavolume,andfailoverprocedures.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 7
SQLServerAlwaysOnLayersofProtectionSQLServerAlwaysOnsolutionshelpprovidefaulttoleranceanddisasterrecoveryacrossseverallogicalandphysicallayersofinfrastructureandapplicationcomponents.Historically,ithasbeenacommonpracticetohaveaseparationofdutiesandresponsibilitiesforthevariousinvolvedaudiencesandroles,suchthateachwaspredominatelyonlyconcernedaportionofthosesolutionlayers.Thissectionofthepaperisorganizedtowalkthroughadeeperdescriptionofeachofthoselayers,andtoofferrationaleandguidanceforyourdesigndiscussionsandimplementationdecisions.AsuccessfulSQLServerAlwaysOnsolutionrequiresunderstandingandcollaborationacrosstheselayers: Infrastructurelevel.Serverlevelfaulttoleranceandintranodenetworkcommunicationleverages
WindowsServerFailoverClustering(WSFC)featuresforhealthmonitoringandfailovercoordination. SQLServerinstancelevel.ASQLServerAlwaysOnFailoverClusterInstance(FCI)isaSQLServer
instancethatisinstalledacrossandcanfailovertoservernodesinaWSFCcluster.ThenodesthathosttheFCIareattachedtorobustsymmetricsharedstorage(SANorSMB).
Databaselevel.Anavailabilitygroupisasetofuserdatabasesthatfailovertogether.Anavailabilitygroupconsistsofaprimaryreplicaandonetofoursecondaryreplicas.EachreplicaishostedbyaninstanceofSQLServer(FCIornonFCI)onadifferentnodeoftheWSFCcluster.
Clientconnectivity.DatabaseclientapplicationscanconnectdirectlytoaSQLServerinstancenetworkname,ortheymayconnecttoavirtualnetworkname(VNN)thatisboundtoanavailabilitygrouplistener.TheVNNabstractstheWSFCclusterandavailabilitygrouptopology,logicallyredirectingconnectionrequeststotheappropriateSQLServerinstanceanddatabasereplica.
ThelogicaltopologyofarepresentativeAlwaysOnsolutionisillustratedinthisdiagram:
Windows Server Failover Clustering (WSFC) Cluster
Network Subnet A Network Subnet B
Node B2Node A1 Node A2 Node A3 Node B1
WSFCConfiguration
WSFCConfiguration
WSFCConfiguration
WSFCConfiguration
WSFCConfiguration
Availability Group Virtual Network Name
SQL ServerInstance 4
SQL Server FailoverCluster Instance1
SQL ServerInstance 2
SQL ServerInstance 3
InstanceNetwork Name
InstanceNetwork Name
InstanceNetwork Name
InstanceNetwork Name
AlwaysOn Availability GroupAvailabilityGroup Listener
SecondaryReplica
SecondaryReplica
SecondaryReplica
PrimaryReplica
Shared Storage StorageStorage Storage
WSFC Quorum Witness Remote File Share (Optional)
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 8
InfrastructureAvailabilityBothAlwaysOnAvailabilityGroupsandAlwaysOnFailoverClusterInstancesleveragetheWindowsServeroperatingsystemandWSFCasaplatformtechnology.Morethaneverbefore,successfulMicrosoftSQLServerdatabaseadministratorswillrelyuponasolidunderstandingofthesetechnologies.WindowsOperatingSystemSQLServerreliesupontheWindowsplatformtoprovidefoundationalinfrastructureandservicesfornetworking,storage,security,patching,andmonitoring.ThedifferenteditionsofSQLServer2012progressivelybuildupontheincreasingcapabilitiesandcapacityofsimilareditionsoftheWindowsServer2008R2operatingsystem,includingWindowsServer2008R2Standardoperatingsystem,WindowsServer2008R2Enterpriseoperatingsystem,andWindowsServer2008R2Datacenteroperatingsystem.Formoreinformation,see:HardwareandSoftwareRequirementsforInstallingSQLServer2012(http://msdn.microsoft.com/enus/library/ms143506(SQL.110).aspx).WindowsServerCoreInstallationOptionAsakeyhighavailabilityfeature,SQLServer2012supportsdeploymentontheServerCoreinstallationoptioninWindowsServer2008orlater.TheServerCoreinstallationoptionprovidesaminimalenvironmentforrunningspecificserverroleswithlimitedfunctionalityandverylimitedGUIapplicationsupport.Bydefault,onlynecessaryservicesandacommandpromptenvironmentareenabled.Thismodeofoperationreducestheoperatingsystemattacksurfaceandsystemoverhead,anditcansignificantlyreduceongoingmaintenance,servicing,andpatchingrequirements.AkeyconsiderationfordeployingSQLServer2012onWindowsServerCoreisthatalldeployment,configuration,administration,andmaintenanceofSQLServerandoftheoperatingsystemmustbedoneusingascriptingenvironmentsuchasWindowsPowerShell,orthroughtheuseofcommandlineorremotetools.OptimizingSQLServerforPrivateCloudHighavailabilityanddisasterrecoveryscenariosareincreasinglycriticalinthePrivateCloudenvironment.DeploySQLServertoyourPrivateCloudtohelpensurethatyourcomputer,networkandstorageresourcesareusedefficiently,reducingbothphysicalfootprintandcapitalandoperationalexpenses.Ithelpsyouconsolidatedeployments,scaleyourresourcesefficiently,anddeployresourcesondemandwithoutcompromisingcontrol.InadditiontoWindowsServerFailoverClusteringsupportforbothHyperVhostandguestsystems,SQLServeralsosupportsLiveMigration,whichistheabilitytomovevirtualmachinesbetweenhostswithnodiscernibledowntime.LiveMigrationalsoworksinconjunctionwithguestclustering.Formoreinformation,seePrivateCloudComputingOptimizingSQLServerforPrivateCloud(http://www.microsoft.com/SqlServerPrivateCloud).
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 9
WindowsServerFailoverClusteringWindowsServerFailoverClustering(WSFC)providesinfrastructurefeaturesthatsupportthehighavailabilityanddisasterrecoveryscenariosofhostedserverapplicationssuchasMicrosoftSQLServer.IfaWSFCclusternodeorservicefails,theservicesorresourcesthatwerehostedonthatnodecanbeautomaticallyormanuallytransferredtoanotheravailablenodeinaprocessknownasfailover.WithAlwaysOnsolutions,thisprocessappliestobothFCIsandtoavailabilitygroups.ThenodesintheWSFCclusterworktogethertocollectivelyprovidethesetypesofcapabilities: Distributedmetadataandnotifications.WSFCserviceandhostedapplicationmetadatais
maintainedoneachnodeinthecluster.ThismetadataincludesWSFCconfigurationandstatusinadditiontohostedapplicationsettings.Changestothemetadataorstatusononenodeareautomaticallypropagatedtotheothernodesinthecluster.
Resourcemanagement.Individualnodesintheclustermayprovidephysicalresourcessuchasdirectattachedstorage(DAS),networkinterfaces,andaccesstoshareddiskstorage.Hostedapplications,suchasSQLServer,registerthemselvesasaclusterresource,andtheycanconfigurestartupandhealthdependenciesuponotherresources.
Healthmonitoring.Internodeandprimarynodehealthdetectionisaccomplishedthroughacombinationofheartbeatstylenetworkcommunicationsandresourcemonitoring.Theoverallhealthoftheclusterisdeterminedbythevotesofaquorumofnodesinthecluster.
Failovercoordination.Eachresourceisconfiguredtobehostedonaprimarynode,andeachcanbeautomaticallyormanuallytransferredtooneormoresecondarynodes.Ahealthbasedfailoverpolicycontrolsautomatictransferofresourceownershipbetweennodes.Nodesandhostedapplicationsarenotifiedwhenfailoveroccurssothattheycanreactappropriately.
Formoreinformation,seeWindowsServer|FailoverClusteringandNodeBalancing(http://www.microsoft.com/windowsserver2008/en/us/failoverclusteringmain.aspx).Note:ItisnowcriticallyimportantthatdatabaseadministratorsunderstandtheinnerworkingsofWSFCclustersandquorummanagement.AlwaysOnhealthmonitoring,management,andfailurerecoverystepsareallintrinsicallytiedtoyourWSFCconfiguration.WSFCStorageConfigurationsWindowsServerFailoverClusteringreliesuponeachnodeintheclustertomanageitsconnectedstoragedevices,diskvolumes,andfilesystem.WSFCassumesthatthestoragesubsystemisextremelyrobust,andthereforeifthestoragedeviceattachedtoanodeisunavailable,theclusternodeisconsideredtobeatfault.Forwritebasedoperations,adiskvolumeislogicallyattachedtoasingleclusternodeatatimeusingaSCSI3persistentreservation.Dependinguponstoragesubsystemcapabilitiesandconfiguration,ifanodefails,logicalownershipofthediskvolumecanbetransferredtoanothernodeinthecluster.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 10
SQLServerAlwaysOnsolutionsbothleverageandarerestrictedtocertainWSFCstorageconfigurationcombinations,including: Directattachedvs.remote.Storagedevicesaredirectlyphysicallyattachedtotheserver,orthey
arepresentedbyaremotedevicethroughanetworkorhostbusadaptor(HBA).RemotestoragetechnologiesincludeStorageAreaNetwork(SAN)basedsolutionssuchasiSCSIorFibreChannel,aswellasServerMessagingBlock(SMB)filesharebasedsolutions.
Symmetricvs.asymmetric.Storagedevicesareconsideredsymmetricifexactlythesamelogicaldiskvolumeconfigurationandfilepathsarepresentedtoeachnodeinthecluster.Thephysicalimplementationandcapacityoftheunderlyingdiskvolumescanvary.
Dedicatedvs.shared.Dedicatedstorageisreservedforuseandassignedtoasinglenodeinthecluster.Sharedstorageisaccessibletomultiplenodesinthecluster.ControlandownershipofcompliantsharedstoragedevicescanbetransferredfromonenodetoanotherusingSCSI3protocols.WSFCsupportstheconcurrentmultinodehostingofclustersharedvolumesforfilesharingpurposes.However,SQLServerdoesnotsupportconcurrentmultinodeaccesstoasharedvolume.
Note:SQLServerFCIsstillrequiresymmetricalsharedstoragetobeaccessiblebyallpossiblenodeownersoftheinstance.However,withtheintroductionofAlwaysOnAvailabilityGroups,youcannowdeploydifferentnonFCIinstancesofSQLServerinaWSFCcluster,eachwithitsownunique,dedicated,localorremotestorage.WSFCResourceHealthDetectionandFailoverEachresourceinaWSFCclusternodecanreportitsstatusandhealth,periodicallyorondemand.Avarietyofcircumstancesmayindicateaclusterresourcefailure,including:powerfailure,diskormemoryerrors,networkcommunicationerrors,misconfiguration,ornonresponsiveservices.YoucanmakeWSFCclusterresourcessuchasnetworks,storage,orservicesdependentupononeanother.Thecumulativehealthofaresourceisdeterminedbysuccessiverollupofitshealthwiththehealthofeachofitsresourcedependencies.ForAlwaysOnAvailabilityGroups,theavailabilitygroupandtheavailabilitygrouplistenerareregisteredasWSFCclusterresources.ForAlwaysOnFailoverClusterInstances,theSQLServerserviceandtheSQLServerAgentserviceareregisteredasWSFCclusterresources,andbotharemadedependentupontheinstancesvirtualnetworknameresource.IfaWSFCclusterresourceexperiencesasetnumberoferrorsorfailuresoveraperiodoftime,theconfiguredfailoverpolicycausestheclusterservicetodooneofthefollowing:
Restarttheresourceonthecurrentnode. Settheresourceoffline. Initiateanautomaticfailoveroftheresourceanditsdependenciestoanothernode.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 11
Note:WSFCclusterresourcehealthdetectionhasnodirectimpactontheindividualnodeshealthortheoverallhealthofthecluster.WSFCClusterValidationWizardTheclustervalidationwizardisafeaturethatisintegratedintofailoverclusteringinWindowsServer2008andWindowsServer2008R2.Itisakeytoolforadatabaseadministratortousetohelpensurethataclean,healthy,stableWSFCenvironmentexists,beforedeployingaSQLServerAlwaysOnsolution.Withtheclustervalidationwizard,youcanrunasetoffocusedtestsoneitheracollectionofserversthatyouintendtouseasnodesinacluster,oronanexistingcluster.Thisprocessteststheunderlyinghardwareandsoftwaredirectly,andindividually,toobtainanaccurateassessmentofhowwellaWSFCclusterwouldbesupportedonagivenconfiguration.Thisvalidationprocessconsistsofaseriesoftestsanddatacollectiononeachnodeinthesecategories: Inventory.InformationonBIOSversions,environmentlevels,hostbustadapters,RAM,operating
systemversions,devices,services,drivers,andsoon. Network.InformationonNICbindingorder,networkcommunications,IPconfiguration,andfirewall
configuration.ValidatesinternodecommunicationsonallNICs. Storage.Informationondisks,drivecapacity,accesslatency,filessystems,andsoon.ValidatesSCSI
commands,diskfailoverfunctionality,andsymmetricorasymmetricstorageconfiguration. Systemconfiguration.ValidatesActiveDirectoryconfiguration,thatdriversaresigned,memory
dumpsettings,requiredoperatingsystemfeaturesandservices,compatibleprocessorarchitecture,andservicepackandWindowsSoftwareUpdatelevels.
Theresultsofthesevalidationtestsgiveyouinformationneededtofinetuneaclusterconfiguration,tracktheconfiguration,andidentifypotentialclusterconfigurationissuesbeforetheycausedowntime.YoucansaveareportofthetestsresultsasaHTMLdocumentforlaterreference.YoushouldrunthesetestsbeforeandafteryoumakeanychangestoWSFCconfiguration,beforeyouinstallSQLServer,andasapartofanydisasterrecoveryprocess.AclustervalidationreportisrequiredbyMicrosoftCustomerSupportServices(CSS)asaconditionofMicrosoftsupportingagivenWSFCclusterconfiguration.Formoreinformation,seeFailoverClusterStepbyStepGuide:ValidatingHardwareforaFailoverCluster(http://technet.microsoft.com/enus/library/cc732035(WS.10).aspx).Note:Ifyourclusterconfigurationhasasymmetricstorage,asisthecasewithhardwarebasedgeoclusteringstoragesolutions,orasmaybethecasewithAlwaysOnAvailabilityGroups,youmayneedtoapplyanumberofhotfixestopreventtheclustervalidationwizardfromfailingthestoragevalidationsteps.Formoreinformation,seePrerequisites,Restrictions,andRecommendationsforAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/enus/library/ff878487(SQL.110).aspx#SystemReqsForAOAG).
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 12
WSFCQuorumModesandVotingConfigurationWSFCusesaquorumbasedapproachtomonitoringoverallclusterhealthandmaximizenodelevelfaulttolerance.AfundamentalunderstandingofWSFCquorummodesandnodevotingconfigurationisveryimportanttodesigning,operating,andtroubleshootingyourAlwaysOnhighavailabilityanddisasterrecoverysolution.ClusterHealthDetectionbyQuorumEachnodeinaWSFCclusterparticipatesinperiodicheartbeatcommunicationtosharethenode'shealthstatuswiththeothernodes.Unresponsivenodesareconsideredtobeinafailedstate.AquorumnodesetisamajorityofthevotingnodesandwitnessesintheWSFCcluster.TheoverallhealthandstatusofaWSFCclusterisdeterminedbyaperiodicquorumvote.Thepresenceofaquorummeansthattheclusterishealthyenoughtoprovidenodelevelfaulttolerance.Theabsenceofaquorumindicatesthattheclusterisnothealthy.OverallWSFCclusterhealthmustbemaintainedinordertoensurethathealthysecondarynodesareavailableforprimarynodestofailoverto.Ifthequorumvotefails,theentireWSFCclusterissetofflineasaprecautionarymeasure.ThisalsocausesallSQLServerinstancesregisteredwiththeclustertobestopped.Note:IfaWSFCclusterissetofflinebecauseofquorumfailure,manualinterventionisrequiredtobringitbackonline.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsectionlaterinthispaper.QuorumModesAquorummodeisconfiguredattheWSFCclusterleveltospecifythemethodologyusedforquorumvoting.TheFailoverClusterManagerutilityrecommendsaquorummodebasedonthenumberofnodesinthecluster.Oneofthefollowingquorummodesdetermineswhatconstitutesaquorumofvotes: NodeMajority.Morethanonehalfofthevotingnodesintheclustermustvoteaffirmativelyforthe
clustertobehealthy. NodeandFileShareMajority.SimilartoNodeMajorityquorummode,exceptthataremotefile
shareisalsoconfiguredasavotingwitness,andconnectivityfromanynodetothatshareisalsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativefortheclustertobehealthy.Asabestpractice,thewitnessfileshareshouldnotresideonanynodeinthecluster,anditshouldbevisibletoallnodesinthecluster.
NodeandDiskMajority.SimilartoNodeMajorityquorummode,exceptthatashareddiskclusterresourceisalsodesignatedasavotingwitness,andconnectivityfromanynodetothatshareddiskisalsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativefortheclustertobehealthy.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 13
DiskOnly.Ashareddiskclusterresourceisdesignatedasawitness,andconnectivitybyanynodetothatshareddiskiscountedasanaffirmativevote.
Formoreinformation,seeFailoverClusterStepbyStepGuide:ConfiguringtheQuoruminaCluster(http://technet.microsoft.com/enus/library/cc770620(WS.10).aspx).Note:Unlesseachnodeintheclusterisconfiguredtousethesamesharedstoragequorumwitnessdisk,youshouldgenerallyusetheNodeMajorityquorummodeifyouhaveanoddnumberofvotingnodes,ortheNodeandFileShareMajorityquorummodeifyouhaveanevennumberofvotingnodes.VotingandNonVotingNodesBydefault,eachnodeintheWSFCclusterisincludedasamemberoftheclusterquorum;eachnode,filesharewitness,anddiskwitnesshasasinglevoteindeterminingtheoverallclusterhealth.ThequorumdiscussiontothispointinthispaperhascarefullyqualifiedthesetofWSFCclusternodesthatvoteonclusterhealthasvotingnodes.Insomecircumstances,youmaynotwanteverynodetohaveavote.EachnodeinaWSFCclustercontinuouslyattemptstoestablishaquorum.Noindividualnodeintheclustercandefinitivelydeterminethattheclusterasawholeishealthyorunhealthy.Atanygivenmoment,fromtheperspectiveofeachnode,someoftheothernodesmayappeartobeoffline,orappeartobeintheprocessoffailover,orappearunresponsiveduetoanetworkcommunicationfailure.AkeyfunctionofthequorumvoteistodeterminewhethertheapparentstateofeachofnodeintheWSFCclusterisindeedthatactualstateofthosenodes.ForallofthequorummodelsexceptDiskOnly,theeffectivenessofaquorumvotedependsonreliablecommunicationsamongallofthevotingnodesinthecluster.Youshouldtrustthequorumvotewhenallnodesareonthesamephysicalsubnet.However,ifanodeonanothersubnetisseenasnonresponsiveinaquorumvote,butitisactuallyonlineandotherwisehealthy,thatismostlikelyduetoanetworkcommunicationsfailurebetweensubnets.Dependingupontheclustertopology,quorummode,andfailoverpolicyconfiguration,thatnetworkcommunicationsfailuremayeffectivelycreatemorethanoneset(orsubset)ofvotingnodes.Ifmorethanonesubsetofvotingnodesisabletoestablishaquorumonitsown,thatisknownasasplitbrainscenario.Insuchascenario,thenodesintheseparatequorumsmaybehavedifferently,andinconflictwithoneanother.Note:Thesplitbrainscenarioispossibleonlyifasystemadministratormanuallyperformsaforcedquorumoperation,orinveryrarecircumstances,aforcedmanualfailover,explicitlysubdividingthequorumnodeset.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsectionlaterinthispaper.Tosimplifyyourquorumconfigurationandincreaseuptime,youmaywanttoadjusteachnodesNodeWeightsetting(avalueof0or1)sothatthenodesvoteisnotcountedtowardsthequorum.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 14
RecommendedAdjustmentstoQuorumVotingTodeterminetherecommendedquorumvotingconfigurationforthecluster,applytheseguidelines,insequentialorder:1. Novotebydefault.Assumethateachnodeshouldnotvotewithoutexplicitjustification.2. Includeallprimarynodes.EachnodethathostsanAlwaysOnAvailabilityGroupprimaryreplicaoris
thepreferredowneroftheAlwaysOnFailoverClusterInstanceshouldhaveavote.3. Includepossibleautomaticfailoverowners.EachnodethatcouldhostaprimaryreplicaorFCI,as
theresultofanautomaticfailover,shouldhaveavote.4. Excludesecondarysitenodes.Ingeneral,donotgivevotestonodesthatresideatasecondary
disasterrecoverysite.Youdonotwantnodesinthesecondarysitetocontributetoadecisiontotaketheclusterofflinewhenthereisnothingwrongwiththeprimarysite.
5. Oddnumberofvotes.Ifnecessary,addawitnessfileshare,awitnessnode(withorwithoutaSQLServerinstance),orawitnessdisktotheclusterandadjustthequorummodetopreventpossibletiesinthequorumvote.
6. Reassessvoteassignmentspostfailover.Youdonotwanttofailoverintoaclusterconfigurationthatdoesnotsupportahealthyquorum.
Formoreinformationonadjustingnodevotes,seeConfigureClusterQuorumNodeWeightSettings(http://msdn.microsoft.com/enus/library/hh270281(SQL.110).aspx).Youcannotadjustthevoteofafilesharewitness.Instead,youmustselectadifferentquorummodetoincludeorexcludeitsvote.Note:SQLServerexposesseveralsystemdynamicmanagementviews(DMVs)thatcanhelpyouadministersettingsrelatedWSFCclusterconfigurationandnodequorumvoting.Formoreinformation,seeMonitorAvailabilityGroups(http://msdn.microsoft.com/enus/library/ff878305(SQL.110).aspx).
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 15
WSFCDisasterRecoverythroughForcedQuorumQuorumfailureisusuallycausedbyasystemicdisasterorapersistentcommunicationsfailureinvolvingseveralnodesintheWSFCcluster.Rememberthatquorumfailurecausesallclusteredservices,SQLServerinstances,andAvailabilityGroupsintheWSFCclustertobesetoffline,becausetheclustercannotensurenodelevelfaulttolerance.AquorumfailuremeansthathealthyvotingnodesintheWSFCclusternolongersatisfythequorummodel.Somenodesmayhavefailedcompletely,andsomemayhavejustshutdowntheWSFCserviceandareotherwisehealthy,exceptforthelossoftheabilitytocommunicatewithaquorum.TobringtheWSFCclusterbackonline,youmustcorrecttherootcauseofthequorumfailureonatleastonenodeundertheexistingconfiguration.Inadisasterscenario,youmayneedtoreconfigureoridentifyalternativehardwaretouse.YoumayalsowanttoreconfiguretheremainingnodesintheWSFCclustertoreflectthesurvivingclustertopologyaswell.YoucanusetheforcedquorumprocedureonaWSFCclusternodetooverridethesafetycontrolsthattooktheclusteroffline.Thiseffectivelytellstheclustertosuspendthequorumvotingchecks,andletsyoubringtheWSFCclusterresourcesandSQLServerbackonlineonanyofthenodesinthecluster.Thistypeofdisasterrecoveryprocessshouldincludethefollowingsteps:1) Determinethescopeofthefailure.IdentifywhichavailabilitygroupsorSQLServerinstancesare
nonresponsiveandwhichclusternodesareonlineandavailableforpostdisasteruse,andthenexaminetheWindowseventlogsandtheSQLServersystemlogs.Wherepractical,youshouldpreserveforensicdataandsystemlogsforlateranalysis.
2) StarttheWSFCclusterbyusingforcedquorumonasinglenode.Onanotherwisehealthynode,manuallyforcetheclustertocomeonlineusingtheforcedquorumprocedure.Tominimizepotentialdataloss,selectanodethatwaslasthostinganavailabilitygroupprimaryreplica.Formoreinformation,seeForceaWSFCClustertoStartWithoutaQuorum(http://msdn.microsoft.com/enus/library/hh270275(v=SQL.110).aspx).Note:Ifyouusetheforcedquorumsetting,quorumchecksareblockedclusterwideuntiltheWSFCclusterachievesamajorityofvotesandautomaticallytransitionstoaregularquorummodeofoperation.
3) StarttheWSFCservicenormallyoneachotherwisehealthynode,oneatatime.Youdonothavetospecifytheforcedquorumoptionwhenyoustarttheclusterserviceontheothernodes.AstheWSFCserviceoneachnodecomesbackonline,itnegotiateswiththeotherhealthynodestosynchronizethenewclusterconfigurationstate.Remembertodothisonenodeatatimetopreventpotentialraceconditionsinresolvingthelastknownstateofthecluster.Note:Ensurethateachnodethatyoustartcancommunicatewiththeothernewlyonlinenodes,oryouruntheriskofcreatingmorethanonequorumnodeset;thatisasplitbrainscenario.Ifyourfindingsinstep1areaccurate,thisshouldnotoccur.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 16
4) Applynewquorummodeandnodevoteconfiguration.Ifyousuccessfullyrestartedallnodesintheclusterusingtheforcedquorumprocedure,andifyoucorrectedtherootcauseofthequorumfailure,youdonotneedtomakechangestotheoriginalquorummodeandnodevoteconfiguration.Otherwise,youshouldevaluatethenewlyrecoveredclusternodeandavailabilityreplicatopology,andchangethequorummodeandvoteassignmentsforeachnodeasappropriate.SettheWSFCclusterserviceonunrecoverednodesoffline,orsettheirnodevotestozero.Note:Atthispoint,thenodesandSQLServerinstancesintheclustermayappeartoberestoredbacktoregularoperation.However,ahealthyquorummaystillnotexist.UsingFailoverClusterManager,ortheAlwaysOnDashboardwithinSQLServerManagementStudio,ortheappropriateDMVs,verifythatahealthyquorumhasbeenrestored.
5) Recoveravailabilitygroupdatabasereplicasasneeded.SomedatabasesmayrecoverandcomebackonlineontheirownaspartoftheregularSQLServerstartupprocess.Therecoveryofotherdatabasesmayrequireadditionalmanualsteps.Youcanminimizepotentialdatalossandrecoverytimefortheavailabilitygroupreplicasbybringingthembackonlineinthissequence,ifpossible:primaryreplica,synchronoussecondaryreplicas,asynchronoussecondaryreplicas.
6) Repairorreplacefailedcomponentsandrevalidatethecluster.Nowthatyouhaverecoveredfromtheinitialdisasterandquorumfailure,youshouldrepairorreplacethefailednodesandadjustrelatedWSFCandAlwaysOnconfigurationsaccordingly.Thiscanincludedroppingavailabilitygroupreplicas,evictingnodesfromthecluster,orflatteningandreinstallingsoftwareonanode.Note:Youmustrepairorremoveallfailedavailabilityreplicas.SQLServer2012doesnottruncatethetransactionlogpastthelastknownpointofthefarthestbehindavailabilityreplica.Ifafailedreplicaisnotrepairedorremovedfromtheavailabilitygroup,thetransactionlogswillgrowandyouwillruntheriskofrunningoutoftransactionlogspaceontheotherreplicas.
7) Repeatstep4asneeded.Thegoalistoreestablishtheappropriateleveloffaulttoleranceandhighavailabilityforhealthyoperations.
8) ConductRPO/RTOanalysis.YoushouldanalyzeSQLServersystemlogs,databasetimestamps,andWindowseventlogstodeterminerootcauseofthefailure,andtodocumentactualRecoveryPointandRecoveryTimeexperiences.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 17
SQLServerInstanceLevelProtectionThenextlayerofprotectioninanAlwaysOnsolutionisthedataplatformitself;thesearethecapabilitiesandfeaturesofferedbyMicrosoftSQLServer2012anditsintegrationwithWindowsServerinfrastructurecomponents.AvailabilityImprovementsSQLServerInstancesThesearenewSQLServer2012instancelevelfeaturesthatenhanceavailabilityforbothAlwaysOnFailoverClusterInstances,aswellasforstandaloneinstancesthathostAlwaysOnAvailabilityGroups.Theseimprovementsrepresentenhancementsformanagingandtroubleshootingfailoverscenarios: FlexibleFailoverPolicy.Theoutputofthenewsystemstoredprocedureusedforrobustfailure
detection,sp_server_diagnostics,usestheFailureConditionLevelpropertytoconveytheseverityofafailureaffectingtheSQLServerinstance.AWSFCfailoverpolicygovernshowthisvalueimpactstheSQLServerinstance;rangingfromrelativetoleranceoferrors,tobeingsensitivetoanySQLServerinternalcomponenterror.Youcanconfigurefailovertobetriggeredbyanyoneofarangeoferrorlevels,including:serverdown,serverunresponsive,criticalerror,moderateerror,oranyqualifiederror.TheFailureConditionLevelpropertycanbeusedforFCIoravailabilitygroupfailoverpolicies.PriortoSQLServer2012,therewasnogranularityoferrorconditionstogovernfailover;anyservicelevelfailurecausedfailover.Formoreinformation,seeFailoverPolicyforFailoverClusterInstances(http://msdn.microsoft.com/enus/library/ff878664(SQL.110).aspx).
Enhancedinstrumentationandlogging.ThereareanumberofAlwaysOnspecificsystemconfigurationviews,DMVs,performancecounters,andanextendedeventhealthsessionthatcapturesanddumpsinformationneededtotroubleshoot,tune,andmonitoryourAlwaysOndeployment.ManyoftheseareexposedvianewSQLServerPolicyManagementfacetsandpolicies.Formoreinformation,seeAlwaysOnAvailabilityGroupsDynamicManagementViewsandFunctions(http://msdn.microsoft.com/enus/library/ff877943(SQL.110).aspx),andsys.dm_os_cluster_nodes(http://msdn.microsoft.com/enus/library/ms187341(SQL.110).aspx).
SMBfilesharesupport.YoucanplacedatabasefilesonaWindowsServer2008orlaterremotefileshareforbothstandaloneandfailoverclusterinstances,negatingtheneedforaseparatedriveletterperFCI.Thisisagoodoptionforstorageconsolidationorforhostingdatabasefilestorageonaphysicalserverforavirtualmachineguestoperatingsystem.Withtherightconfiguration,I/Operformancecanverynearlyapproximatethatofdirectattachedstorage.Formoreinformation,seeSQLDatabasesonFileSharesIt'stimetoreconsiderthescenario(http://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sqldatabasesonfilesharesitstimetoreconsiderthescenario.aspx).
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 18
Note:InaWSFCcluster,youcannotaddaSMBfileshareresourcedependencytotheSQLServerresourcegroup;youmusttakeseparatemeasurestoensuretheavailabilityofthefileshare.Ifthefilesharebecomesunavailable,SQLServerthrowsanI/Oexceptionandgoesoffline.
WSFCinteroperabilitywithDNS.Thevirtualnetworkname(VNN)foranFCIoravailabilitygrouplistenerisregisteredwithDNSonlyduringVNNcreationorduringconfigurationchanges.AllvirtualIPaddresses,regardlessofonlineorofflinestate,areregisteredwithDNSunderthesamevirtualnetworkname.ClientcallstoresolvethevirtualnetworknameinDNSreturnalloftheregisteredIPaddressinavaryingroundrobinsequence.
AlwaysOnFailoverClusterInstancesTheprimarypurposeofanAlwaysOnSQLServerFailoverClusterInstance(FCI)istoenhanceavailabilityofaSQLServerinstancehostedonlocalserverandstoragehardwarewithinasingledatacenter.AnFCIisasinglelogicalSQLServerinstancethatisinstalledacrossnodesinaWindowsServerFailoverClustering(WSFC)cluster,butonlyactiveononenodeatatime.ClientapplicationsconnecttoavirtualnetworknameandvirtualIPaddressthatareownedbytheactiveclusternode.EachinstallednodehasanidenticalconfigurationandsetofSQLServerbinaries.TheWSFCclusterservicealsoreplicatesrelevantchangesfromtheactiveinstancesentriesintheWindowsregistrytoeachinstallednode.EachnodethattheFCIisinstalledonisdesignatedasapossibleowneroftheinstanceanditsresources,withinapreferredfailoversequence.DatabasefilesarestoredonsharedsymmetricalstoragevolumesareregisteredasaresourcewiththeWSFCcluster,andareownedbythenodethatcurrentlyhoststheFCI.Formoreinformation,seeAlwaysOnFailoverClusterInstances(http://msdn.microsoft.com/enus/library/ms189134(SQL.110).aspx).FCIFailoverProcessIfadependentclusterresourcefails,anAlwaysOnFailoverClusterInstanceinteractswiththeWSFCclusterserviceusingthishighlevelprocesstodoafailover:1) Arestartisindicated.AperiodiccheckoftheWSFCorSQLServerFailoverPolicyconfiguration
indicatesafailedstate.Bydefault,aservicerestartisattemptedbeforeafailovertoanothernodeisinitiated.Atimeoutintherestartattemptindicatesaresourcefailure.
2) Afailoverisindicated.AFailoverPolicycheckindicatestheneedforanodefailover.3) TheSQLServerserviceisstopped.Ifcurrentlyrunning,anorderlyshutdownoftheSQLServer
serviceisattempted.4) TheWSFCclusterresourceistransferred.OwnershipoftheSQLServerclusterresourcegroupand
itsdependentnetworkandsharedstorageresourcesaretransferredtothenextpreferrednodeowneroftheFCI.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 19
5) SQLServerisstartedonthenewnode.TheSQLServerinstancegoesthroughitsnormalstartupprocedures.Ifitdoesnotcomebackonlinewithinapendingtimeoutperiod,theclusterserviceputstheresourceonthisnewnodeinafailedstate.
6) Userdatabasesarerecoveredonthenewnode.Eachuserdatabaseisplacedinrecoverymodewhiletransactionlogredooperationsareappliedanduncommittedtransactionsarerolledback.
FCIImprovementsPreviousversionsofSQLServerhaveofferedaFCIinstallationoption;however,severalfeatureenhancementsinSQLServer2012improveavailabilityrobustnessandserviceability: Multisubnetclustering.SQLServer2012supportsWSFCclusternodesthatresideinmorethanone
subnet.AgivenSQLServerinstancethatresidesonaWSFCclusternodecanstartifanynetworkinterfaceisavailable;thisisknownasanORclusterresourcedependency.PriorversionsofSQLServerrequiredthatallnetworkinterfacesbefunctionalfortheSQLServerservicetostartorfailover,andthattheyallexistonthesamesubnetorVLAN.Note:Storagelevelreplicationbetweenclusternodesisnotimplicitlyenabledwithmultisubnetclustering.YourmultisubnetFCIsolutionmustleverageathirdpartySANbasedsolutiontoreplicatedataandcoordinatestoragefailoverbetweenclusternodes.Formoreinformation,seeSQLServer2012AlwaysOn:MultisiteFailoverClusterInstance(http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sqlserver2012alwayson_3a00_multisitefailoverclusterinstance.aspx).
Robustfailuredetection.TheWSFCclusterservicemaintainsadedicatedadministrativeconnectiontoeachSQLServer2012FCIonthenode.Onthisconnection,aperiodicalcalltoaspecialsystemstoredprocedure,sp_server_diagnostics,returnsaricharrayofsystemhealthdiagnosticinformation.PriortoSQLServer2012,theprimaryhealthdetectionmechanismforaFCIwasimplementedasasimpleonewaypollingprocess.Inthisprocess,theWSFCclusterserviceperiodicallycreatedanewSQLclientconnectiontotheinstance,queriedtheservername,andthendisconnected.Afailuretoconnect,oraquerytimeout,forwhateverreason,triggeredafailoverwithverylittleavailablediagnosticinformation.Formoreinformation,seesql_server_diagnostics(http://msdn.microsoft.com/enus/library/ff878233(SQL.110).aspx).
ThereisnowbroadersupportforFCIstoragescenarios: Bettermountpointsupport.SQLServersetupnowrecognizesclusterdiskmountpointsettings.The
specifiedclusterdisksandalldisksmountedtoitareautomaticallyaddedtotheSQLServerresourcedependencyduringsetup.
tempdbonlocalstorage.FCIsnowsupportplacementoftempdbonlocalnonsharedstorage,suchasalocalsolidstatedrive,potentiallyoffloadingasignificantamountofI/OfromasharedSAN.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 20
PriortoSQLServer2012,FCIsrequiredtempdbtobelocatedonasymmetricalsharedstoragevolumethatfailedoverwithothersystemdatabases.Note:Thelocationoftempdbisstoredinthemasterdatabase,whichmovesbetweennodesduringfailover.Itmustbeonavalidsymmetricalfilepath(drive,folders,andpermissions)onallpotentialnodeowners,orelsetheSQLServerservicewillnotstartonsomenodes.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 21
DatabaseAvailabilityThehighavailabilitycapabilitiesofferedbytheinfrastructureandSQLServerinstancelevelcomponentsworktogethertoimplicitlyprotecthosteddatabases.AnAlwaysOnsolutionoffersanadditionalsetofoptionsforexplicitlyprotectingdatabasedataanddatatierapplications.AlwaysOnAvailabilityGroupsAnavailabilitygroupisasetofuserdatabasesthatfailovertogetherfromoneSQLServerinstancetoanotherwithinthesameWSFCcluster.ClientapplicationscanconnecttotheavailabilitygroupsdatabasesthroughaWSFCvirtualnetworkname,knownasanavailabilitygrouplistener,whichabstractstheunderlyingSQLServerinstances.AlwaysOnAvailabilityGroupsrelyuponWindowsServerFailoverClusteringforhealthmonitoring,failovercoordination,andserverconnectivity.YoumustenableAlwaysOnsupportonaSQLServerinstancethatresidesonaWSFCclusternode.However,thatinstancedoesnothavetobeaFCI,anditdoesnotrequiretheuseofsymmetricalsharedstorage.Formoreinformation,seeOverviewofAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/enus/library/ff877884(SQL.110).aspx).AvailabilityReplicasandRolesEachSQLServerinstanceintheavailabilitygrouphostsanavailabilityreplicathatcontainsacopyoftheuserdatabasesintheavailabilitygroup.ASQLServerinstancecanhostonlyoneavailabilityreplicafromagivenavailabilitygroup,butmultipleavailabilitygroupsmayresideonthesameinstance.TheSQLServerinstancemusthavededicated(nonshared)storagevolumes.Oneoftheavailabilityreplicasservesintheroleofprimaryreplica.Itisdesignatedasthemastercopyoftheavailabilitygroupdatabasesandisenabledforread/writeoperations.Anavailabilitygroupcancontainfromonetofouradditionalreadonlyavailabilityreplicasthateachseparatelyserveintheroleofasecondaryreplica.AvailabilityReplicaSynchronizationThecontentsofeachdatabaseinanavailabilitygrouparesynchronizedfromtheprimaryreplicatoeachofsecondaryreplicasthroughamechanismofSQLServerlogbaseddatamovement.Forthisreason,alldatabasesintheavailabilitygroupmustbesettothefullrecoverymodel.Secondaryreplicasareinitializedwithafullbackupandrestoreoftheprimaryreplicasdatabasesandtransactionlogs.Asnewtransactionsarecommittedontheprimaryreplica,thecorrespondingportionofthetransactionlogiscached,queued,andthensentoverthenetworktoadatabasemirroringendpointoneachofthesecondaryreplicanodes.Inthismanner,newentriesintheprimaryreplicatransactionlogareappendedontoeachofthesecondaryreplicastransactionlogs.Eachsecondaryreplicaperiodicallycommunicatesalogsequencenumber(LSN)backtotheprimaryreplicatoindicateawatermarkofhowmuchoftheirtransactionloghasbeenhardenedandflushedtotheremotedisk.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 22
Note:Eachavailabilityreplicahasitsownsetofindependenttransactionlogredothreadsthatarenotpartoftheavailabilityreplicasynchronizationprocess.Youmayperceivedelaysinthelogredoprocessonthesecondaryreplicasasdatalatency.Inadditiontohavingaroleofprimaryorsecondary,eachavailabilityreplicaalsohasanavailabilitymode,whichgovernsthecoordinationofhardeningthetransactionlogsduringaCOMMITTRANstatement: Synchronouscommitmode.Theprimaryreplicacommitsagiventransactiononlyafterall
synchronouscommitsecondaryreplicasacknowledgethattheyhavefinishedhardeningtheirrespectivetransactionlogspastthattransactionsLSN.Anavailabilitygroupcanhaveupto2synchronouscommitsecondaryreplicas.Synchronouscommitmodeintroducestransactionlatencyontheprimaryreplicadatabases,butitensuresthatthereisnodatalossonthesecondaryreplicasforcommittedtransactions.
Asynchronouscommitmode.Theprimaryreplicacommitstransactionsafterhardeningthelocaltransactionlog,butitdoesnotwaitforacknowledgementthatanasynchronouscommitsecondaryreplicahashardeneditstransactionlog.Anavailabilitygroupcanhaveupto4asynchronouscommitsecondaryreplicas,butnomorethanatotalof4secondaryreplicasofanytype.Asynchronouscommitmodeminimizestransactionlatencyontheprimaryreplicadatabasesbutallowsthesecondaryreplicatransactionlogstolagbehind,makingsomedatalosspossible.
Formoreinformation,seeAvailabilityModes(http://msdn.microsoft.com/enus/library/ff877931(SQL.110).aspx).Theoverallhealthofthedataflowbetweentheavailabilityreplicasisindicatedbythesynchronizationstateofeachreplica.YouwillmostlikelyexperiencedatalossifyoufailovertoasecondaryreplicawithasynchronizationstateofanythingotherthanSynchronizedorSynchronizing.Eachsecondaryreplicassynchronizationstreamhasasessiontimeoutproperty.Whenasecondaryreplicaconfiguredforasynchronouscommitavailabilitymodefailswithasessiontimeout,itistemporarilymarkedinternallyasasynchronous.Thisisdonesothatthesecondaryreplicafailuredoesnotimpacthardeningofthetransactionlogontheprimaryreplica.Afterthatsecondaryreplicaishealthyandcaughtbackupwithprimaryreplica,itautomaticallyrevertstonormalsynchronouscommitmodeoperations.AvailabilityGroupFailoverTheavailabilitygroupandacorrespondingvirtualnetworknameareregisteredasresourcesintheWSFCcluster.Anavailabilitygroupfailsoveratthelevelofanavailabilityreplica,baseduponthehealthandfailoverpolicyoftheprimaryreplica.AnavailabilitygroupfailoverpolicyusestheFailureConditionLevelpropertytoindicatetheseveritytolerancelevelforafailureaffectingtheavailabilitygroup,inconjunctionwiththesp_server_diagnosticssystemstoredprocedure.ThissamemechanismisusedforFCIfailoverpolicies.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 23
Intheeventofafailover,insteadoftransferringownershipofsharedphysicalresourcestoanothernode,WSFCisleveragedtoreconfigureasecondaryreplicaonanotherSQLServerinstancetotakeovertheroleofprimaryreplica.Theavailabilitygroup'svirtualnetworknameresourceisthentransferredtothatinstance.Allclientconnectionstotheinvolvedavailabilityreplicasarereset.Baseduponthecurrenthealth,synchronizationstate,andavailabilitymodeofthereplicas,eachreplicahasacompositefailoverreadinessstatethatindicatesthepotentialfordataloss.ThisreplicahealthinformationisviewableintheAlwaysOnDashboard,orinthesys.dm_hadr_availability_replica_statessystemview.Eachavailabilityreplicaalsohasaconfiguredfailovermode,whichgovernsreplicabehaviorwhenfailoverisindicated. Automaticfailover(withoutdataloss).ThisallowsforthefastestfailovertimeofanyAlwaysOn
configurationbecausethesecondaryreplicatransactionlogisalreadyhardenedandsynchronized.Opentransactionsontheprimaryreplicaarerolledback,andtheprimaryreplicaroleistransferredtoasecondaryreplicawithoutanyuserintervention.Theprimaryandsecondaryreplicasmustbesettoautomaticfailovermode,andbothmustbesettosynchronouscommitavailabilitymode.ThesynchronizationstatebetweenthereplicasmustbeSynchronized.Additionally,theWSFCclustermusthaveahealthyquorum.AutomaticfailoverisnotsupportediftheprimaryorsecondaryreplicaresidesonanFCI.ThisisblockedtopreventapotentialraceconditionbetweenavailabilitygroupandFCIfailovers.
Manualfailover.Thisallowstheadministratortoassessthestateoftheprimaryreplica,andmakeadecisiontodeliberatelyfailovertoasecondaryreplicaornot.Dependingupontheavailabilitymodeandsynchronizationstate,youhavethesechoices:o Plannedmanualfailover(withoutdataloss).Youcanperformthistypeoffailoveronlyifboth
theprimaryandsecondaryreplicasarehealthyandinaSynchronizedstate.Thisisfunctionallyequivalenttoanautomaticfailover.
o Forcedmanualfailover(allowingpotentialdataloss).Thisistheonlyformoffailoverthatispossibleifthetargetsecondaryreplicaisinasynchronouscommitavailabilitymode,orifitisnotsynchronizedwiththeprimaryreplica.Warning:Youshouldusethisfailoveroptioninadisasterrecoverysituationonly.Iftheprimaryreplicaishealthyandavailable,youshouldchangetheavailabilitymodeoftheinvolvedreplicastosynchronouscommitandthenperformaplannedmanualfailover.Formoreinformation,seePerformaForcedManualFailoverofanAvailabilityGroup(http://msdn.microsoft.com/enus/library/ff877957(SQL.110).aspx).
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 24
Youmustperformamanualfailoverifanyofthefollowingconditionsaretrueabouteithertheprimaryreplicaorthesecondaryreplicathatyouwanttofailoverto: Failovermodeissettomanual. Availabilitymodeissettoasynchronouscommit. ReplicaresidesonanFCI.Formoreinformation,seeFailoverModes(AlwaysOnAvailabilityGroups)(http://msdn.microsoft.com/enus/library/hh213151(SQL.110).aspx).Note:Afterafailover,ifthenewprimaryreplicaisnotsettothesynchronouscommitmode,thesecondaryreplicaswillindicateaSuspendedsynchronizationstate.Nodatawillflowtothesecondaryreplicasuntiltheprimaryreplicaissettosynchronouscommitmode.AvailabilityGroupListenerAnavailabilitygrouplistenerisaWSFCvirtualnetworkname(VNN)thatclientscanusetoaccessadatabaseintheavailabilitygroup.TheVNNclusterresourceisownedbytheSQLServerinstanceonwhichtheprimaryreplicaresides.ThevirtualnetworknameisregisteredwithDNSonlyduringavailabilitygrouplistenercreationorduringconfigurationchanges.AllvirtualIPaddressesthataredefinedintheavailabilitygrouplistenerareregisteredwithDNSunderthesamevirtualnetworkname.Tousetheavailabilitygrouplistener,aclientconnectionrequestmustspecifythevirtualnetworknameastheserver,andadatabasenamethatisintheavailabilitygroup.Bydefault,thisshouldresultinaconnectiontotheSQLServerinstancethatishostingtheprimaryreplica.Atruntime,theclientusesitslocalDNSresolvertogetalistofIPaddressesandTCPportsthatmaptothevirtualnetworkname.TheclientthenattemptstoconnecttoeachoftheIPaddresses,untilitissuccessful,oruntilitreachestheconnectiontimeout.TheclientwillattempttomaketheseconnectionsinparalleliftheMultiSubnetFailoverparameterissettotrue,enablingmuchfasterclientfailovers.Intheeventofafailover,clientconnectionsareresetontheserver,ownershipoftheavailabilitygrouplistenermoveswiththeprimaryreplicaroletoanewSQLServerinstance,andtheVNNendpointisboundtothenewinstancesvirtualIPaddressesandTCPports.Formoreinformation,seeClientConnectivityandApplicationFailover(http://msdn.microsoft.com/enus/library/hh213417(SQL.110).aspx).
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 25
ApplicationIntentFilteringWhileconnectingthroughtheavailabilitygrouplistener,theapplicationcanspecifywhetheritsintentistobothreadandwritedataorwhetheritwillexclusivelyperformreadonlyoperations.Ifnotspecified,thedefaultapplicationintentfortheclientisreadwrite.Fortheprimaryroleandsecondaryroleofeachavailabilityreplica,youcanalsospecifyaconnectionaccesspropertythatwillbeusedasaconnectionlevelfilterontheclientsapplicationintent.Bydefault,invalidapplicationintentandconnectionaccesscombinationsresultinarefusedconnection.SQLServershouldfilteroutclientconnectionrequestsusingthefollowingrules.Whiletheavailabilityreplicaisintheprimaryrole,andconnectionaccessisequalto:
Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent. Allowonlyexplicitread/writeintent.Ifclientspecifiesreadonly,rejectconnection.
Whiletheavailabilityreplicaisinthesecondaryrole,andconnectionaccessisequalto: Noconnectionsallowed.Refuseallconnections;replicaisusedonlyfordisasterrecovery. Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent. Readonlyapplicationintent.Ifclientdoesnotspecifyreadonly,rejectconnection.
Formoreinformation,seeConfigureConnectionAccessonanAvailabilityReplica(http://msdn.microsoft.com/enus/library/hh213002(SQL.110).aspx).ApplicationIntentReadOnlyRoutingAkeyvaluepropositionforAlwaysOnAvailabilityGroupsistheabilitytoleverageyourstandbyhardwareinfrastructureforpurposesotherthandisasterrecovery.Byconfiguringoneormoreofyoursecondaryreplicasforreadonlyaccess,youcanoffloadsignificantworkloadsfromyourprimaryreplicas.Workloadsthatcanbereadilyadaptedtorunoffofareadonlysecondaryreplicainclude:reporting,databasebackups,databaseconsistencychecks,indexfragmentationanalysis,datapipelineextraction,operationalsupport,andadhocqueries.Foreachavailabilityreplica,youcanoptionallyconfigureasequentialreadonlyroutinglistofSQLServerinstanceendpointstobeappliedwhilethatreplicaisintheprimaryrole.Ifpresent,thislistisusedtoredirectclientconnectionrequeststhatspecifyreadonlyapplicationintenttothefirstavailablesecondaryreplicainthelistthatsatisfiestheapplicationintentfiltersnotedearlier.Note:Thereadonlyroutingredirectionisperformedbytheavailabilitygrouplistener,whichisboundtotheprimaryreplica.Iftheprimaryreplicaisoffline,clientredirectionwillnotfunction.Formoreinformation,seeConfigureReadOnlyRoutingonanAvailabilityGroup(SQLServer)(http://msdn.microsoft.com/enus/library/hh653924(SQL.110).aspx)
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 26
AvailabilityImprovementsDatabasesSQLServer2012hasanumberoffeatureenhancementsthatarespecifictodatabaseconfigurationandcapabilities.Thefollowingimprovementreducesrecoverytime: PredictableRecoveryTime.Youcansetatargetrecoverytimeintervalperdatabase,whichisused
tocontroltheschedulingofabackgroundCHECKPOINTcommand.Thisindirectcheckpointoccursperiodically,baseduponestimatedtimeneededtorecoverthetransactionlogintheeventofarestartorfailover.ThishastheeffectofsmoothingI/Oouttoroughlyequalproportionsforeachcheckpoint,andincreasingrecoverytime(RTO)predictability.PriortoSQLServer2012,backgroundCHECKPOINTcommandswereissuedonafixedinterval,irrespectiveoftransactionvolumeorload,whichcouldleadtounpredictablerecoverytimes.Formoreinformation,seeDatabaseCheckpoints(http://msdn.microsoft.com/enus/library/ms189573(SQL.110).aspx).
Theseimprovementsmitigatecommonscenariosthatcandriveplanneddowntime: OnlineindexoperationsforLOBcolumns.Indexesthatcontaincolumnswithvarbinary(max),
varchar(max),nvarchar(max),orXMLdatatypescannowberebuiltorreorganizedonline. OnlineschemamodificationfornewNOTNULLcolumns.IfanewNOTNULLcolumnisaddedwitha
defaultvaluetoaSQLServer2012databasetable,onlyaschemalockisrequiredtoupdatesystemmetadata;allrowsdonothavetobepopulatedduringtheALTERTABLEstatement.SQLServerwillphysicallypersistthedefaultcolumnvalueonlyifarowisactuallymodifiedorreindexed.Queriesreturnthedefaultvaluefrommetadata,unlessanactualcolumnvalueexists.
Thereisanexampleofbroadersupportforstoragescenarios: AutomaticPageRepair.Certaintypesofstoragesubsystemerrorscancorruptadatapage,makingit
unreadable.AlwaysOnAvailabilityGroupscandetectandautomaticallyrecoverfromthesetypesoferrorsbyasynchronouslyrequestingandapplyingafreshcopyoftheaffecteddatapagesfromadifferentavailabilityreplica.SimilarfunctionalityexistedpriortoSQLServer2012fordatabasemirroring,butitisnowenhancedtosupportmultiplereplicas.Formoreinformation,seeAutomaticPageRepair(http://msdn.microsoft.com/enus/library/bb677167(SQL.110).aspx).
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 27
ClientConnectivityRecommendationsFollowtheseguidelinestoenableclientapplicationstotakefulladvantageofMicrosoftSQLServer2012AlwaysOntechnologies: AlwaysOnawareclientlibrary.Useaclientlibrarythatsupportsthetabulardatastream(TDS)
protocolversion7.4ornewer.ThisshouldprovidethedesiredclientsidefunctionalityforAlwaysOnfeatures.ExampleclientlibrariesincludetheDataProviderforSQLServerin.NETFramework4.02,andtheSQLNativeClient11.0.
Connectionproviderproperty:MultiSubnetFailover=True.UsethiskeywordinyourconnectionstringstoenableclientlibrariestoattempttoconnectinparalleltoallIPaddressesthatareregisteredfortheavailabilitygrouplistenerortheFCIthathasIPaddressinmultiplesubnets.
Connectionproviderproperty:ApplicationIntent=ReadOnly.Wherepractical,offloadreadonlyworkloadsfromyourprimaryreplicaontothesecondaryreplicas.
Legacyclientconnectiontimeout.Legacyclientdatabaselibrariesdonotimplementparallelconnectionattempts,sowhenmultipleIPaddressesarepresent,theytrytoconnecttoeachofthemsequentially,untiltheyencounteraTCPtimeout,oruntiltheymakeasuccessfulconnection.YoushouldadjustyourconnectiontimeoutonlegacyclientstoaccommodatethepotentialsequentialtimeoutsandretrieswhenmultipleIPaddressesarepresent,toavaluethatisatleast15seconds+21secondsforeverysecondaryreplica.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 28
ConclusionThiswhitepaperhasestablishedthebaselinecontextforhowtoreduceplannedandunplanneddowntime,maximizeapplicationavailability,andprovidedataprotectionusingSQLServer2012AlwaysOnhighavailabilityanddisasterrecoverysolutions.Manyofthebusinessdriversandchallengesofplanning,managing,andmeasuringahighlyavailabledatabaseenvironmentcanbequantifiedandexpressedasRecoveryPointObjects(RPO)andRecoveryTimeObjectives(RTO).SQLServer2012AlwaysOnprovidescapabilitiesattheinfrastructure,dataplatform,anddatabaselevelthatcanhelpyourorganizationaddresscommonhighavailabilityanddisasterrecoveryscenarios,inamannerthatcanbewelljustifiedusingRPOandRTOgoals.
For more information:
http://www.microsoft.com/sqlserver/: SQL Server Web site
http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter
http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter
Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5 (excellent), how would you rate this paper and why have you given it this rating? For example:
Are you rating it high due to having good examples, excellent screen shots, clear writing, or another reason?
Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing? This feedback will help us improve the quality of white papers we release.
Send feedback.
Version 1.1, 21 February 2012.
CoverContentsHigh Availability and Disaster Recovery ConceptsDescribing High AvailabilityPlanned vs. Unplanned DowntimeDegraded Availability
Quantifying DowntimeRecovery ObjectivesJustifying ROI or Opportunity CostMonitoring Availability HealthPlanning for Disaster Recovery
Overview: High Availability with Microsoft SQL Server 2012SQL Server AlwaysOnSignificantly Reduce Planned DowntimeEliminate Idle Hardware and Improve Cost Efficiency and PerformanceEasy Deployment and ManagementContrasting RPO and RTO Capabilities
SQL Server AlwaysOn Layers of ProtectionInfrastructure AvailabilityWindows Operating SystemWindows Server Failover ClusteringWSFC Cluster Validation WizardWSFC Quorum Modes and Voting ConfigurationWSFC Disaster Recovery through Forced Quorum
SQL Server Instance Level ProtectionAvailability Improvements SQL Server InstancesAlwaysOn Failover Cluster Instances
Database AvailabilityAlwaysOn Availability GroupsAvailability Group FailoverAvailability Group ListenerAvailability Improvements Databases
Client Connectivity Recommendations
Conclusion