microsoft sql server always on solutions guide for high availability and disaster recovery

33
Microsoft SQL Server AlwaysOn Solutions Guide for High Availability and Disaster Recovery LeRoy Tuttle, Jr. Quick Guide Microsoft

Upload: abraxis-yggdrasil

Post on 28-Aug-2015

48 views

Category:

Documents


5 download

DESCRIPTION

Microsoft SQL Server Always on Solutions Guide for High Availability and Disaster Recovery

TRANSCRIPT

  • Microsoft SQL ServerAlwaysOn Solutions Guide for High

    Availability and Disaster RecoveryLeRoy Tuttle, Jr.

    Quick Guide

    Microsoft

  • MicSolAvaLeRoy

    ContribMishra Review(SQLHAMattheThoma

    SummamaximizAlwaysO

    A key gobetweeninfrastru

    CategorApplies Source: E-book 32 page

    crosoutionailaby Tuttle,

    butors: Li

    wers: KeviA), Alexei ews, AyadSs, Benjam

    ry: This whze applicatioOn high ava

    oal of this pn business sucture engin

    ry: Quick Gto: SQL SeWhite pappublicatios

    oft SQns Guility a, Jr.

    indsey All

    n Farlee, SKhalyako,Shammou

    min Wright

    ite paper don availabililability and

    paper is to estakeholderneers, and d

    uide erver 2012 er (link to s

    on date: Ma

    QL Seuide and

    en, Justin

    Shahryar G, Wolfganut (Caregrt-Jones

    iscusses hoity, and pro

    d disaster re

    establish a rs, technicadatabase ad

    source contay 2012

    erverfor HDisas

    Erickson,

    G. Hashemg Kutsche

    roup), Dav

    ow to reducovide data pecovery sol

    common col decision mdministrato

    ent)

    r AlwHigh ster

    Min He, C

    mi (Motricera (Bwin vid P. Smit

    ce planned protection utions.

    ontext for rmakers, systors.

    waysO

    Reco

    Cephas Li

    city), AllanParty), Chth (Service

    and unplanusing SQL S

    related disctem archite

    On

    overy

    n, Sanjay

    n Hirt harles eU), Juerg

    nned downtServer 2012

    ussions ects,

    y

    gen

    time, 2

    Microsoft'

  • This page intentionally left blank

  • Copyright 2012 by Microsoft CorporationAll rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means without the written permission of the publisher.

    Microsoft and the trademarks listed at http://www.microsoft.com/about/legal/en/us/IntellectualProperty/Trademarks/EN-US.aspx are trademarks of the Microsoft group of companies. All other marks are property of their respective owners. The example companies, organizations, products, domain names, email addresses, logos, people, places, and events depicted herein are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, place, or event is intended or should be inferred. This book expresses the authors views and opinions. The information contained in this book is provided without any express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers, or distributors will be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery iv

    ContentsHighAvailabilityandDisasterRecoveryConcepts.........................................................................1DescribingHighAvailability................................................................................................................................................1

    Plannedvs.UnplannedDowntime..........................................................................................................................................1DegradedAvailability..............................................................................................................................................................2

    QuantifyingDowntime.........................................................................................................................................................2RecoveryObjectives................................................................................................................................................................3JustifyingROIorOpportunityCost..........................................................................................................................................3MonitoringAvailabilityHealth................................................................................................................................................4PlanningforDisasterRecovery...............................................................................................................................................4

    Overview:HighAvailabilitywithMicrosoftSQLServer2012..................................................................................5SQLServerAlwaysOn..............................................................................................................................................................5SignificantlyReducePlannedDowntime.................................................................................................................................5EliminateIdleHardwareandImproveCostEfficiencyandPerformance................................................................................6EasyDeploymentandManagement.......................................................................................................................................6ContrastingRPOandRTOCapabilities....................................................................................................................................6

    SQLServerAlwaysOnLayersofProtection..........................................................................................7InfrastructureAvailability...................................................................................................................................................8

    WindowsOperatingSystem....................................................................................................................................................8WindowsServerFailoverClustering.......................................................................................................................................9WSFCClusterValidationWizard...........................................................................................................................................11WSFCQuorumModesandVotingConfiguration..................................................................................................................12WSFCDisasterRecoverythroughForcedQuorum................................................................................................................15

    SQLServerInstanceLevelProtection...........................................................................................................................17AvailabilityImprovementsSQLServerInstances...............................................................................................................17AlwaysOnFailoverClusterInstances.....................................................................................................................................18

    DatabaseAvailability..........................................................................................................................................................21AlwaysOnAvailabilityGroups...............................................................................................................................................21AvailabilityGroupFailover....................................................................................................................................................22AvailabilityGroupListener....................................................................................................................................................24AvailabilityImprovementsDatabases................................................................................................................................26

    ClientConnectivityRecommendations........................................................................................................................27Conclusion..............................................................................................................................................................................28

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 1

    HighAvailabilityandDisasterRecoveryConceptsYoucanmakethebestselectionofadatabasetechnologyforahighavailabilityanddisasterrecoverysolutionwhenallstakeholdershaveasharedunderstandingoftherelatedbusinessdrivers,challenges,andobjectivesofplanning,managing,andmeasuringRTOandRPOobjectives.ReaderswhoarefamiliarwiththeseconceptscanmoveaheadtotheOverview:HighAvailabilitywithMicrosoftSQLServer2012sectionofthispaper.DescribingHighAvailabilityForagivensoftwareapplicationorservice,highavailabilityisultimatelymeasuredintermsoftheendusersexperienceandexpectations.Thetangibleandperceivedbusinessimpactofdowntimemaybeexpressedintermsofinformationloss,propertydamage,decreasedproductivity,opportunitycosts,contractualdamages,orthelossofgoodwill.Theprincipalgoalofahighavailabilitysolutionistominimizeormitigatetheimpactofdowntime.AsoundstrategyforthisoptimallybalancesbusinessprocessesandServiceLevelAgreements(SLAs)withtechnicalcapabilitiesandinfrastructurecosts.Aplatformisconsideredhighlyavailablepertheagreementandexpectationsofcustomersandstakeholders.Theavailabilityofasystemcanbeexpressedasthiscalculation:

    100%

    Theresultingvalueisoftenexpressedbyindustryintermsofthenumberof9sthatthesolutionprovides;meanttoconveyanannualnumberofminutesofpossibleuptime,orconversely,minutesofdowntime.

    Numberof9s AvailabilityPercentage TotalAnnualDowntime2 99% 3days,15hours3 99.9% 8hours,45minutes4 99.99% 52minutes,34seconds5 99.999% 5minutes,15seconds

    Plannedvs.UnplannedDowntimeSystemoutagesareeitheranticipatedandplannedfor,ortheyaretheresultofanunplannedfailure.Downtimeneednotbeconsiderednegativelyifitisappropriatelymanaged.Therearetwokeytypesofforeseeabledowntime: Plannedmaintenance.Atimewindowispreannouncedandcoordinatedforplannedmaintenance

    taskssuchassoftwarepatching,hardwareupgrades,passwordupdates,offlinereindexing,dataloading,ortherehearsalofdisasterrecoveryprocedures.Deliberate,wellmanagedoperationalproceduresshouldminimizedowntimeandpreventanydataloss.Plannedmaintenanceactivities

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 2

    canbeseenasinvestmentsneededtopreventormitigateotherpotentiallymoresevereunplannedoutagescenarios.

    Unplannedoutage.Systemlevel,infrastructure,orprocessfailuresmayoccurthatareunplannedoruncontrollable,orthatareforeseeable,butconsideredeithertoounlikelytooccur,orareconsideredtohaveanacceptableimpact.Arobusthighavailabilitysolutiondetectsthesetypesoffailures,automaticallyrecoversfromtheoutage,andthenreestablishesfaulttolerance.

    WhenestablishingSLAsforhighavailability,youshouldcalculateseparatekeyperformanceindicators(KPIs)forplannedmaintenanceactivitiesandunplanneddowntime.Thisapproachallowsyoutocontrastyourinvestmentinplannedmaintenanceactivitiesagainstthebenefitofavoidingunplanneddowntime.DegradedAvailabilityHighavailabilityshouldnotbeconsideredasanallornothingproposition.Asanalternativetoacompleteoutage,itisoftenacceptabletotheenduserforasystemtobepartiallyavailable,ortohavelimitedfunctionalityordegradedperformance.Thesevaryingdegreesofavailabilityinclude: Readonlyanddeferredoperations.Duringamaintenancewindow,orduringaphaseddisaster

    recovery,dataretrievalisstillpossible,butnewworkflowsandbackgroundprocessingmaybetemporarilyhaltedorqueued.

    Datalatencyandapplicationresponsiveness.Duetoaheavyworkload,aprocessingbacklog,orapartialplatformfailure,limitedhardwareresourcesmaybeovercommittedorundersized.Userexperiencemaysuffer,butworkmaystillgetdoneinalessproductivemanner.

    Partial,transient,orimpendingfailures.Robustnessintheapplicationlogicorhardwarestackthatretriesorselfcorrectsuponencounteringanerror.Thesetypesofissuesmayappeartotheenduserasdatalatencyorpoorapplicationresponsiveness.

    Partialendtoendfailure.Plannedorunplannedoutagesmayoccurgracefullywithinverticallayersofthesolutionstack(infrastructure,platform,andapplication),orhorizontallybetweendifferentfunctionalcomponents.Usersmayexperiencepartialsuccessordegradation,dependinguponthefeaturesorcomponentsthatareaffected.

    Theacceptabilityofthesesuboptimalscenariosshouldbeconsideredaspartofaspectrumofdegradedavailabilityleadinguptoacompleteoutage,andasintermediatestepsinaphaseddisasterrecovery.QuantifyingDowntimeWhendowntimedoesoccur,eitherplanned,orunplanned,theprimarybusinessgoalistobringthesystembackonlineandminimizedataloss.Everyminuteofdowntimehasdirectandindirectcosts.Withunplanneddowntime,youmustbalancethetimeandeffortneededtodeterminewhytheoutageoccurred,whatthecurrentsystemstateis,andwhatstepsareneededtorecoverfromtheoutage.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 3

    Atapredeterminedpointinanyoutage,youshouldmakeorseekthebusinessdecisiontostopinvestigatingtheoutageorperformingmaintenancetasks,recoverfromtheoutagebybringingthesystembackonline,andifneeded,reestablishfaulttolerance.RecoveryObjectivesDataredundancyisakeycomponentofahighavailabilitydatabasesolution.TransactionalactivityonyourprimarySQLServerinstanceissynchronouslyorasynchronouslyappliedtooneormoresecondaryinstances.Whenanoutageoccurs,transactionsthatwereinflightmayberolledback,ortheymaybelostonthesecondaryinstancesduetodelaysindatapropagation.Youcanbothmeasuretheimpact,andsetrecoverygoalsintermshowlongittakestogetbackinbusiness,andhowmuchtimelatencythereisinthelasttransactionrecovered: RecoveryTimeObjective(RTO).Thisisthedurationoftheoutage.Theinitialgoalistogetthe

    systembackonlineinatleastareadonlycapacitytofacilitateinvestigationofthefailure.However,theprimarygoalistorestorefullservicetothepointthatnewtransactionscantakeplace.

    RecoveryPointObjective(RPO).Thisisoftenreferredtoasameasureofacceptabledataloss.Itisthetimegaporlatencybetweenthelastcommitteddatatransactionbeforethefailureandthemostrecentdatarecoveredafterthefailure.Theactualdatalosscanvarydependingupontheworkloadonthesystematthetimeofthefailure,thetypeoffailure,andthetypeofhighavailabilitysolutionused.

    YoushoulduseRTOandRPOvaluesasgoalsthatindicatebusinesstolerancefordowntimeandacceptabledataloss,andasmetricsformonitoringavailabilityhealth.JustifyingROIorOpportunityCostThebusinesscostsofdowntimemaybeeitherfinancialorintheformofcustomergoodwill.Thesecostsmayaccruewithtime,ortheymaybeincurredatacertainpointintheoutagewindow.Inadditiontoprojectingthecostofincurringanoutagewithagivenrecoverytimeanddatarecoverypoint,youcanalsocalculatethebusinessprocessandinfrastructureinvestmentsneededtoattainyourRTOandRPOgoalsortoavoidtheoutagealltogether.Theseinvestmentthemesshouldinclude: Avoidingdowntime.Outagerecoverycostsareavoidedalltogetherifanoutagedoesntoccurinthe

    firstplace.Investmentsincludethecostoffaulttolerantandredundanthardwareorinfrastructure,distributingworkloadsacrossisolatedpointsoffailure,andplanneddowntimeforpreventivemaintenance.

    Automatingrecovery.Ifasystemfailureoccurs,youcangreatlymitigatetheimpactofdowntimeonthecustomerexperiencethroughautomaticandtransparentrecovery.

    Resourceutilization.Secondaryorstandbyinfrastructurecansitidle,awaitinganoutage.Italsocanbeleveragedforreadonlyworkloads,ortoimproveoverallsystemperformancebydistributingworkloadsacrossallavailablehardware.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 4

    ForgivenRTOandRPOgoals,theneededavailabilityandrecoveryinvestments,combinedwiththeprojectedcostsofdowntime,canbeexpressedandjustifiedasafunctionoftime.Duringanactualoutage,thisallowsyoutomakecostbaseddecisionsbasedontheelapseddowntime.MonitoringAvailabilityHealthFromanoperationalpointofview,duringanactualoutage,youshouldnotattempttoconsiderallrelevantvariablesandcalculateROIoropportunitycostsinrealtime.Instead,youshouldmonitordatalatencyonyourstandbyinstancesasaproxyforexpectedRPO.Intheeventofanoutage,youshouldalsolimittheinitialtimespentinvestigatingtherootcauseduringtheoutage,andinsteadfocusonvalidatingthehealthofyourrecoveryenvironment,andthenrelyupondetailedsystemlogsandsecondarycopiesofdataforsubsequentforensicanalysis.PlanningforDisasterRecoveryWhilehighavailabilityeffortsentailwhatyoudotopreventanoutage,disasterrecoveryeffortsaddresswhatisdonetoreestablishhighavailabilityaftertheoutage.Asmuchaspossible,disasterrecoveryproceduresandresponsibilitiesshouldbeformulatedbeforeanactualoutageoccurs.Baseduponactivemonitoringandalerts,thedecisiontoinitiateanautomatedormanualfailoverandrecoveryplanshouldbetiedtopreestablishedRTOandRPOthresholds.Thescopeofasounddisasterrecoveryplanshouldinclude: Granularityoffailureandrecovery.Dependinguponthelocationandtypeoffailure,youcantake

    correctiveactionatdifferentlevels;thatis,datacenter,infrastructure,platform,application,orworkload.

    Investigativesourcematerial.Baselineandrecentmonitoringhistory,systemalerts,eventlogs,anddiagnosticqueriesshouldallbereadilyaccessiblebyappropriateparties.

    Coordinationofdependencies.Withintheapplicationstack,andacrossstakeholders,whatarethesystemandbusinessdependencies?

    Decisiontree.Apredetermined,repeatable,validateddecisiontreethatincludesroleresponsibilities,faulttriage,failovercriteriaintermsofRPOandRTOgoals,andprescribedrecoverysteps.

    Validation.Aftertakingstepstorecoverfromtheoutage,whatmustbedonetoverifythatthesystemhasreturnedtonormaloperations?

    Documentation.Capturealloftheaboveitemsinasetofdocumentation,withsufficientdetailandclaritysothatathirdpartyteamcanexecutetherecoveryplanwithminimalassistance.Thistypeofdocumentationiscommonlyreferredasarunbookoracookbook.

    Recoveryrehearsals.RegularlyexercisethedisasterrecoveryplantoestablishbaselineexpectationsforRTOgoals,andconsiderregularrotationofhostingtheprimaryproductionsiteontheprimaryandeachofthedisasterrecoverysites.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 5

    Overview:HighAvailabilitywithMicrosoftSQLServer2012AchievingtherequiredRPOandRTOgoalsinvolvesensuringcontinuousuptimeofcriticalapplicationsandprotectionofcriticaldatafromunplannedandplanneddowntime.SQLServerprovidesasetoffeaturesandcapabilitiesthatcanhelpachievethosegoalswhilekeepingthecostandcomplexitylow.ReaderswhohaveahighlevelfamiliaritywiththenewAlwaysOncapabilitiescanmoveaheadtothedeepercoverageintheSQLServerAlwaysOnLayersofProtectionsectionofthispaper.SQLServerAlwaysOnAlwaysOnisanewintegrated,flexible,costefficienthighavailabilityanddisasterrecoverysolution.Itcanprovidedataandhardwareredundancywithinandacrossdatacenters,andimprovesapplicationfailovertimetoincreasetheavailabilityofyourmissioncriticalapplications.AlwaysOnprovidesflexibilityinconfigurationandenablesreuseofexistinghardwareinvestments.AnAlwaysOnsolutioncanleveragetwomajorSQLServer2012featuresforconfiguringavailabilityatboththedatabaseandtheinstancelevel: AlwaysOnAvailabilityGroups,newinSQLServer2012,greatlyenhancethecapabilitiesofdatabase

    mirroringandhelpsensureavailabilityofapplicationdatabases,andtheyenablezerodatalossthroughlogbaseddatamovementfordataprotectionwithoutshareddisks.Availabilitygroupsprovideanintegratedsetofoptionsincludingautomaticandmanualfailoverofalogicalgroupofdatabases,supportforuptofoursecondaryreplicas,fastapplicationfailover,andautomaticpagerepair.

    AlwaysOnFailoverClusterInstances(FCIs)enhancetheSQLServerfailoverclusteringfeatureandsupportmultisiteclusteringacrosssubnets,whichenablescrossdatacenterfailoverofSQLServerinstances.Fasterandmorepredictableinstancefailoverisanotherkeybenefitthatenablesfasterapplicationrecovery.

    SignificantlyReducePlannedDowntimeThekeyreasonforapplicationdowntimeinanyorganizationisplanneddowntimecausedbyoperatingsystempatching,hardwaremaintenance,andsoon.Thiscanconstitutealmost80percentoftheoutagesinanITenvironment.SQLServer2012helpsreduceplanneddowntimesignificantlybyreducingpatchingrequirementsandenablingmoreonlinemaintenanceoperations: WindowsServerCore.SQLServer2012supportsdeploymentsonWindowsServerCore,aminimal,

    streamlineddeploymentoptionforWindowsServer2008andWindowsServer2008R2.Thisoperatingsystemconfigurationcanreduceplanneddowntimebyminimizingoperatingsystempatchingrequirementsbyasmuchas60percent.

    OnlineOperations.EnhancedsupportforonlineoperationslikeLOBreindexingandaddingcolumnswithdefaultvalueshelpstoreducedowntimeduringdatabasemaintenanceoperations.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 6

    RollingUpgradeandPatching.AlwaysOnfeaturesfacilitaterollingupgradesandpatchingofinstances,whichhelpssignificantlytoreduceapplicationdowntime.

    SQLServeronHyperV.SQLServerinstanceshostedintheHyperVenvironmentreceivetheadditionalbenefitofLiveMigration,whichenablesyoutomigratevirtualmachinesbetweenhostswithzerodowntime.Administratorscanperformmaintenanceoperationsonthehostwithoutimpactingapplications.

    EliminateIdleHardwareandImproveCostEfficiencyandPerformanceTypicalhighavailabilitysolutionsinvolvedeploymentofcostly,redundant,passiveservers.AlwaysOnAvailabilityGroupsenableyoutoutilizesecondarydatabasereplicasonotherwisepassiveoridleserversforreadonlyworkloadssuchasSQLServerReportingServicesreportqueriesorbackupoperations.Theabilitytosimultaneouslyutilizeboththeprimaryandsecondarydatabasereplicashelpsimproveperformanceofallworkloadsduetobetterresourcebalancingacrossyourserverhardwareinvestments.EasyDeploymentandManagementFeaturessuchastheConfigurationWizard,supportfortheWindowsPowerShellcommandlineinterface,dashboards,dynamicmanagementviews(DMVs),policybasedmanagement,andSystemCenterintegrationhelpsimplifydeploymentandmanagementofavailabilitygroups.ContrastingRPOandRTOCapabilitiesThebusinessgoalsforRecoveryPointObjective(RPO)andRecoveryTimeObjective(RTO)shouldbekeydriversinselectingaSQLServertechnologyforyourhighavailabilityanddisasterrecoverysolution.Thistableoffersaroughcomparisonofthetypeofresultsthatthosedifferentsolutionsmayachieve:

    HighAvailabilityandDisasterRecoverySQLServerSolution

    PotentialDataLoss(RPO)

    PotentialRecoveryTime(RTO)

    AutomaticFailover

    ReadableSecondaries(1)

    AlwaysOnAvailabilityGroupsynchronouscommit

    Zero Seconds Yes(4) 02

    AlwaysOnAvailabilityGroupasynchronouscommit

    Seconds Minutes No 04

    AlwaysOnFailoverClusterInstance NA(5) Secondstominutes

    Yes NA

    DatabaseMirroring(2)Highsafety(sync+witness)

    Zero Seconds Yes NA

    DatabaseMirroring(2)Highperformance(async)

    Seconds(6) Minutes(6) No NA

    LogShipping Minutes(6) Minutestohours(6)

    No Notduringarestore

    Backup,Copy,Restore(3) Hours(6) Hourstodays(6)

    No Notduringarestore

    (1)AnAlwaysOnAvailabilityGroupcanhavenomorethanatotaloffoursecondaryreplicas,regardlessoftype.(2)ThisfeaturewillberemovedinafutureversionofMicrosoftSQLServer.UseAlwaysOnAvailabilityGroupsinstead.(3)Backup,Copy,Restoreisappropriatefordisasterrecovery,butnotforhighavailability.(4)Automaticfailoverofanavailabilitygroupisnotsupportedtoorfromafailoverclusterinstance.(5)TheFCIitselfdoesntprovidedataprotection;datalossisdependentuponthestoragesystemimplementation.(6)Highlydependentupontheworkload,datavolume,andfailoverprocedures.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 7

    SQLServerAlwaysOnLayersofProtectionSQLServerAlwaysOnsolutionshelpprovidefaulttoleranceanddisasterrecoveryacrossseverallogicalandphysicallayersofinfrastructureandapplicationcomponents.Historically,ithasbeenacommonpracticetohaveaseparationofdutiesandresponsibilitiesforthevariousinvolvedaudiencesandroles,suchthateachwaspredominatelyonlyconcernedaportionofthosesolutionlayers.Thissectionofthepaperisorganizedtowalkthroughadeeperdescriptionofeachofthoselayers,andtoofferrationaleandguidanceforyourdesigndiscussionsandimplementationdecisions.AsuccessfulSQLServerAlwaysOnsolutionrequiresunderstandingandcollaborationacrosstheselayers: Infrastructurelevel.Serverlevelfaulttoleranceandintranodenetworkcommunicationleverages

    WindowsServerFailoverClustering(WSFC)featuresforhealthmonitoringandfailovercoordination. SQLServerinstancelevel.ASQLServerAlwaysOnFailoverClusterInstance(FCI)isaSQLServer

    instancethatisinstalledacrossandcanfailovertoservernodesinaWSFCcluster.ThenodesthathosttheFCIareattachedtorobustsymmetricsharedstorage(SANorSMB).

    Databaselevel.Anavailabilitygroupisasetofuserdatabasesthatfailovertogether.Anavailabilitygroupconsistsofaprimaryreplicaandonetofoursecondaryreplicas.EachreplicaishostedbyaninstanceofSQLServer(FCIornonFCI)onadifferentnodeoftheWSFCcluster.

    Clientconnectivity.DatabaseclientapplicationscanconnectdirectlytoaSQLServerinstancenetworkname,ortheymayconnecttoavirtualnetworkname(VNN)thatisboundtoanavailabilitygrouplistener.TheVNNabstractstheWSFCclusterandavailabilitygrouptopology,logicallyredirectingconnectionrequeststotheappropriateSQLServerinstanceanddatabasereplica.

    ThelogicaltopologyofarepresentativeAlwaysOnsolutionisillustratedinthisdiagram:

    Windows Server Failover Clustering (WSFC) Cluster

    Network Subnet A Network Subnet B

    Node B2Node A1 Node A2 Node A3 Node B1

    WSFCConfiguration

    WSFCConfiguration

    WSFCConfiguration

    WSFCConfiguration

    WSFCConfiguration

    Availability Group Virtual Network Name

    SQL ServerInstance 4

    SQL Server FailoverCluster Instance1

    SQL ServerInstance 2

    SQL ServerInstance 3

    InstanceNetwork Name

    InstanceNetwork Name

    InstanceNetwork Name

    InstanceNetwork Name

    AlwaysOn Availability GroupAvailabilityGroup Listener

    SecondaryReplica

    SecondaryReplica

    SecondaryReplica

    PrimaryReplica

    Shared Storage StorageStorage Storage

    WSFC Quorum Witness Remote File Share (Optional)

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 8

    InfrastructureAvailabilityBothAlwaysOnAvailabilityGroupsandAlwaysOnFailoverClusterInstancesleveragetheWindowsServeroperatingsystemandWSFCasaplatformtechnology.Morethaneverbefore,successfulMicrosoftSQLServerdatabaseadministratorswillrelyuponasolidunderstandingofthesetechnologies.WindowsOperatingSystemSQLServerreliesupontheWindowsplatformtoprovidefoundationalinfrastructureandservicesfornetworking,storage,security,patching,andmonitoring.ThedifferenteditionsofSQLServer2012progressivelybuildupontheincreasingcapabilitiesandcapacityofsimilareditionsoftheWindowsServer2008R2operatingsystem,includingWindowsServer2008R2Standardoperatingsystem,WindowsServer2008R2Enterpriseoperatingsystem,andWindowsServer2008R2Datacenteroperatingsystem.Formoreinformation,see:HardwareandSoftwareRequirementsforInstallingSQLServer2012(http://msdn.microsoft.com/enus/library/ms143506(SQL.110).aspx).WindowsServerCoreInstallationOptionAsakeyhighavailabilityfeature,SQLServer2012supportsdeploymentontheServerCoreinstallationoptioninWindowsServer2008orlater.TheServerCoreinstallationoptionprovidesaminimalenvironmentforrunningspecificserverroleswithlimitedfunctionalityandverylimitedGUIapplicationsupport.Bydefault,onlynecessaryservicesandacommandpromptenvironmentareenabled.Thismodeofoperationreducestheoperatingsystemattacksurfaceandsystemoverhead,anditcansignificantlyreduceongoingmaintenance,servicing,andpatchingrequirements.AkeyconsiderationfordeployingSQLServer2012onWindowsServerCoreisthatalldeployment,configuration,administration,andmaintenanceofSQLServerandoftheoperatingsystemmustbedoneusingascriptingenvironmentsuchasWindowsPowerShell,orthroughtheuseofcommandlineorremotetools.OptimizingSQLServerforPrivateCloudHighavailabilityanddisasterrecoveryscenariosareincreasinglycriticalinthePrivateCloudenvironment.DeploySQLServertoyourPrivateCloudtohelpensurethatyourcomputer,networkandstorageresourcesareusedefficiently,reducingbothphysicalfootprintandcapitalandoperationalexpenses.Ithelpsyouconsolidatedeployments,scaleyourresourcesefficiently,anddeployresourcesondemandwithoutcompromisingcontrol.InadditiontoWindowsServerFailoverClusteringsupportforbothHyperVhostandguestsystems,SQLServeralsosupportsLiveMigration,whichistheabilitytomovevirtualmachinesbetweenhostswithnodiscernibledowntime.LiveMigrationalsoworksinconjunctionwithguestclustering.Formoreinformation,seePrivateCloudComputingOptimizingSQLServerforPrivateCloud(http://www.microsoft.com/SqlServerPrivateCloud).

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 9

    WindowsServerFailoverClusteringWindowsServerFailoverClustering(WSFC)providesinfrastructurefeaturesthatsupportthehighavailabilityanddisasterrecoveryscenariosofhostedserverapplicationssuchasMicrosoftSQLServer.IfaWSFCclusternodeorservicefails,theservicesorresourcesthatwerehostedonthatnodecanbeautomaticallyormanuallytransferredtoanotheravailablenodeinaprocessknownasfailover.WithAlwaysOnsolutions,thisprocessappliestobothFCIsandtoavailabilitygroups.ThenodesintheWSFCclusterworktogethertocollectivelyprovidethesetypesofcapabilities: Distributedmetadataandnotifications.WSFCserviceandhostedapplicationmetadatais

    maintainedoneachnodeinthecluster.ThismetadataincludesWSFCconfigurationandstatusinadditiontohostedapplicationsettings.Changestothemetadataorstatusononenodeareautomaticallypropagatedtotheothernodesinthecluster.

    Resourcemanagement.Individualnodesintheclustermayprovidephysicalresourcessuchasdirectattachedstorage(DAS),networkinterfaces,andaccesstoshareddiskstorage.Hostedapplications,suchasSQLServer,registerthemselvesasaclusterresource,andtheycanconfigurestartupandhealthdependenciesuponotherresources.

    Healthmonitoring.Internodeandprimarynodehealthdetectionisaccomplishedthroughacombinationofheartbeatstylenetworkcommunicationsandresourcemonitoring.Theoverallhealthoftheclusterisdeterminedbythevotesofaquorumofnodesinthecluster.

    Failovercoordination.Eachresourceisconfiguredtobehostedonaprimarynode,andeachcanbeautomaticallyormanuallytransferredtooneormoresecondarynodes.Ahealthbasedfailoverpolicycontrolsautomatictransferofresourceownershipbetweennodes.Nodesandhostedapplicationsarenotifiedwhenfailoveroccurssothattheycanreactappropriately.

    Formoreinformation,seeWindowsServer|FailoverClusteringandNodeBalancing(http://www.microsoft.com/windowsserver2008/en/us/failoverclusteringmain.aspx).Note:ItisnowcriticallyimportantthatdatabaseadministratorsunderstandtheinnerworkingsofWSFCclustersandquorummanagement.AlwaysOnhealthmonitoring,management,andfailurerecoverystepsareallintrinsicallytiedtoyourWSFCconfiguration.WSFCStorageConfigurationsWindowsServerFailoverClusteringreliesuponeachnodeintheclustertomanageitsconnectedstoragedevices,diskvolumes,andfilesystem.WSFCassumesthatthestoragesubsystemisextremelyrobust,andthereforeifthestoragedeviceattachedtoanodeisunavailable,theclusternodeisconsideredtobeatfault.Forwritebasedoperations,adiskvolumeislogicallyattachedtoasingleclusternodeatatimeusingaSCSI3persistentreservation.Dependinguponstoragesubsystemcapabilitiesandconfiguration,ifanodefails,logicalownershipofthediskvolumecanbetransferredtoanothernodeinthecluster.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 10

    SQLServerAlwaysOnsolutionsbothleverageandarerestrictedtocertainWSFCstorageconfigurationcombinations,including: Directattachedvs.remote.Storagedevicesaredirectlyphysicallyattachedtotheserver,orthey

    arepresentedbyaremotedevicethroughanetworkorhostbusadaptor(HBA).RemotestoragetechnologiesincludeStorageAreaNetwork(SAN)basedsolutionssuchasiSCSIorFibreChannel,aswellasServerMessagingBlock(SMB)filesharebasedsolutions.

    Symmetricvs.asymmetric.Storagedevicesareconsideredsymmetricifexactlythesamelogicaldiskvolumeconfigurationandfilepathsarepresentedtoeachnodeinthecluster.Thephysicalimplementationandcapacityoftheunderlyingdiskvolumescanvary.

    Dedicatedvs.shared.Dedicatedstorageisreservedforuseandassignedtoasinglenodeinthecluster.Sharedstorageisaccessibletomultiplenodesinthecluster.ControlandownershipofcompliantsharedstoragedevicescanbetransferredfromonenodetoanotherusingSCSI3protocols.WSFCsupportstheconcurrentmultinodehostingofclustersharedvolumesforfilesharingpurposes.However,SQLServerdoesnotsupportconcurrentmultinodeaccesstoasharedvolume.

    Note:SQLServerFCIsstillrequiresymmetricalsharedstoragetobeaccessiblebyallpossiblenodeownersoftheinstance.However,withtheintroductionofAlwaysOnAvailabilityGroups,youcannowdeploydifferentnonFCIinstancesofSQLServerinaWSFCcluster,eachwithitsownunique,dedicated,localorremotestorage.WSFCResourceHealthDetectionandFailoverEachresourceinaWSFCclusternodecanreportitsstatusandhealth,periodicallyorondemand.Avarietyofcircumstancesmayindicateaclusterresourcefailure,including:powerfailure,diskormemoryerrors,networkcommunicationerrors,misconfiguration,ornonresponsiveservices.YoucanmakeWSFCclusterresourcessuchasnetworks,storage,orservicesdependentupononeanother.Thecumulativehealthofaresourceisdeterminedbysuccessiverollupofitshealthwiththehealthofeachofitsresourcedependencies.ForAlwaysOnAvailabilityGroups,theavailabilitygroupandtheavailabilitygrouplistenerareregisteredasWSFCclusterresources.ForAlwaysOnFailoverClusterInstances,theSQLServerserviceandtheSQLServerAgentserviceareregisteredasWSFCclusterresources,andbotharemadedependentupontheinstancesvirtualnetworknameresource.IfaWSFCclusterresourceexperiencesasetnumberoferrorsorfailuresoveraperiodoftime,theconfiguredfailoverpolicycausestheclusterservicetodooneofthefollowing:

    Restarttheresourceonthecurrentnode. Settheresourceoffline. Initiateanautomaticfailoveroftheresourceanditsdependenciestoanothernode.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 11

    Note:WSFCclusterresourcehealthdetectionhasnodirectimpactontheindividualnodeshealthortheoverallhealthofthecluster.WSFCClusterValidationWizardTheclustervalidationwizardisafeaturethatisintegratedintofailoverclusteringinWindowsServer2008andWindowsServer2008R2.Itisakeytoolforadatabaseadministratortousetohelpensurethataclean,healthy,stableWSFCenvironmentexists,beforedeployingaSQLServerAlwaysOnsolution.Withtheclustervalidationwizard,youcanrunasetoffocusedtestsoneitheracollectionofserversthatyouintendtouseasnodesinacluster,oronanexistingcluster.Thisprocessteststheunderlyinghardwareandsoftwaredirectly,andindividually,toobtainanaccurateassessmentofhowwellaWSFCclusterwouldbesupportedonagivenconfiguration.Thisvalidationprocessconsistsofaseriesoftestsanddatacollectiononeachnodeinthesecategories: Inventory.InformationonBIOSversions,environmentlevels,hostbustadapters,RAM,operating

    systemversions,devices,services,drivers,andsoon. Network.InformationonNICbindingorder,networkcommunications,IPconfiguration,andfirewall

    configuration.ValidatesinternodecommunicationsonallNICs. Storage.Informationondisks,drivecapacity,accesslatency,filessystems,andsoon.ValidatesSCSI

    commands,diskfailoverfunctionality,andsymmetricorasymmetricstorageconfiguration. Systemconfiguration.ValidatesActiveDirectoryconfiguration,thatdriversaresigned,memory

    dumpsettings,requiredoperatingsystemfeaturesandservices,compatibleprocessorarchitecture,andservicepackandWindowsSoftwareUpdatelevels.

    Theresultsofthesevalidationtestsgiveyouinformationneededtofinetuneaclusterconfiguration,tracktheconfiguration,andidentifypotentialclusterconfigurationissuesbeforetheycausedowntime.YoucansaveareportofthetestsresultsasaHTMLdocumentforlaterreference.YoushouldrunthesetestsbeforeandafteryoumakeanychangestoWSFCconfiguration,beforeyouinstallSQLServer,andasapartofanydisasterrecoveryprocess.AclustervalidationreportisrequiredbyMicrosoftCustomerSupportServices(CSS)asaconditionofMicrosoftsupportingagivenWSFCclusterconfiguration.Formoreinformation,seeFailoverClusterStepbyStepGuide:ValidatingHardwareforaFailoverCluster(http://technet.microsoft.com/enus/library/cc732035(WS.10).aspx).Note:Ifyourclusterconfigurationhasasymmetricstorage,asisthecasewithhardwarebasedgeoclusteringstoragesolutions,orasmaybethecasewithAlwaysOnAvailabilityGroups,youmayneedtoapplyanumberofhotfixestopreventtheclustervalidationwizardfromfailingthestoragevalidationsteps.Formoreinformation,seePrerequisites,Restrictions,andRecommendationsforAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/enus/library/ff878487(SQL.110).aspx#SystemReqsForAOAG).

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 12

    WSFCQuorumModesandVotingConfigurationWSFCusesaquorumbasedapproachtomonitoringoverallclusterhealthandmaximizenodelevelfaulttolerance.AfundamentalunderstandingofWSFCquorummodesandnodevotingconfigurationisveryimportanttodesigning,operating,andtroubleshootingyourAlwaysOnhighavailabilityanddisasterrecoverysolution.ClusterHealthDetectionbyQuorumEachnodeinaWSFCclusterparticipatesinperiodicheartbeatcommunicationtosharethenode'shealthstatuswiththeothernodes.Unresponsivenodesareconsideredtobeinafailedstate.AquorumnodesetisamajorityofthevotingnodesandwitnessesintheWSFCcluster.TheoverallhealthandstatusofaWSFCclusterisdeterminedbyaperiodicquorumvote.Thepresenceofaquorummeansthattheclusterishealthyenoughtoprovidenodelevelfaulttolerance.Theabsenceofaquorumindicatesthattheclusterisnothealthy.OverallWSFCclusterhealthmustbemaintainedinordertoensurethathealthysecondarynodesareavailableforprimarynodestofailoverto.Ifthequorumvotefails,theentireWSFCclusterissetofflineasaprecautionarymeasure.ThisalsocausesallSQLServerinstancesregisteredwiththeclustertobestopped.Note:IfaWSFCclusterissetofflinebecauseofquorumfailure,manualinterventionisrequiredtobringitbackonline.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsectionlaterinthispaper.QuorumModesAquorummodeisconfiguredattheWSFCclusterleveltospecifythemethodologyusedforquorumvoting.TheFailoverClusterManagerutilityrecommendsaquorummodebasedonthenumberofnodesinthecluster.Oneofthefollowingquorummodesdetermineswhatconstitutesaquorumofvotes: NodeMajority.Morethanonehalfofthevotingnodesintheclustermustvoteaffirmativelyforthe

    clustertobehealthy. NodeandFileShareMajority.SimilartoNodeMajorityquorummode,exceptthataremotefile

    shareisalsoconfiguredasavotingwitness,andconnectivityfromanynodetothatshareisalsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativefortheclustertobehealthy.Asabestpractice,thewitnessfileshareshouldnotresideonanynodeinthecluster,anditshouldbevisibletoallnodesinthecluster.

    NodeandDiskMajority.SimilartoNodeMajorityquorummode,exceptthatashareddiskclusterresourceisalsodesignatedasavotingwitness,andconnectivityfromanynodetothatshareddiskisalsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativefortheclustertobehealthy.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 13

    DiskOnly.Ashareddiskclusterresourceisdesignatedasawitness,andconnectivitybyanynodetothatshareddiskiscountedasanaffirmativevote.

    Formoreinformation,seeFailoverClusterStepbyStepGuide:ConfiguringtheQuoruminaCluster(http://technet.microsoft.com/enus/library/cc770620(WS.10).aspx).Note:Unlesseachnodeintheclusterisconfiguredtousethesamesharedstoragequorumwitnessdisk,youshouldgenerallyusetheNodeMajorityquorummodeifyouhaveanoddnumberofvotingnodes,ortheNodeandFileShareMajorityquorummodeifyouhaveanevennumberofvotingnodes.VotingandNonVotingNodesBydefault,eachnodeintheWSFCclusterisincludedasamemberoftheclusterquorum;eachnode,filesharewitness,anddiskwitnesshasasinglevoteindeterminingtheoverallclusterhealth.ThequorumdiscussiontothispointinthispaperhascarefullyqualifiedthesetofWSFCclusternodesthatvoteonclusterhealthasvotingnodes.Insomecircumstances,youmaynotwanteverynodetohaveavote.EachnodeinaWSFCclustercontinuouslyattemptstoestablishaquorum.Noindividualnodeintheclustercandefinitivelydeterminethattheclusterasawholeishealthyorunhealthy.Atanygivenmoment,fromtheperspectiveofeachnode,someoftheothernodesmayappeartobeoffline,orappeartobeintheprocessoffailover,orappearunresponsiveduetoanetworkcommunicationfailure.AkeyfunctionofthequorumvoteistodeterminewhethertheapparentstateofeachofnodeintheWSFCclusterisindeedthatactualstateofthosenodes.ForallofthequorummodelsexceptDiskOnly,theeffectivenessofaquorumvotedependsonreliablecommunicationsamongallofthevotingnodesinthecluster.Youshouldtrustthequorumvotewhenallnodesareonthesamephysicalsubnet.However,ifanodeonanothersubnetisseenasnonresponsiveinaquorumvote,butitisactuallyonlineandotherwisehealthy,thatismostlikelyduetoanetworkcommunicationsfailurebetweensubnets.Dependingupontheclustertopology,quorummode,andfailoverpolicyconfiguration,thatnetworkcommunicationsfailuremayeffectivelycreatemorethanoneset(orsubset)ofvotingnodes.Ifmorethanonesubsetofvotingnodesisabletoestablishaquorumonitsown,thatisknownasasplitbrainscenario.Insuchascenario,thenodesintheseparatequorumsmaybehavedifferently,andinconflictwithoneanother.Note:Thesplitbrainscenarioispossibleonlyifasystemadministratormanuallyperformsaforcedquorumoperation,orinveryrarecircumstances,aforcedmanualfailover,explicitlysubdividingthequorumnodeset.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsectionlaterinthispaper.Tosimplifyyourquorumconfigurationandincreaseuptime,youmaywanttoadjusteachnodesNodeWeightsetting(avalueof0or1)sothatthenodesvoteisnotcountedtowardsthequorum.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 14

    RecommendedAdjustmentstoQuorumVotingTodeterminetherecommendedquorumvotingconfigurationforthecluster,applytheseguidelines,insequentialorder:1. Novotebydefault.Assumethateachnodeshouldnotvotewithoutexplicitjustification.2. Includeallprimarynodes.EachnodethathostsanAlwaysOnAvailabilityGroupprimaryreplicaoris

    thepreferredowneroftheAlwaysOnFailoverClusterInstanceshouldhaveavote.3. Includepossibleautomaticfailoverowners.EachnodethatcouldhostaprimaryreplicaorFCI,as

    theresultofanautomaticfailover,shouldhaveavote.4. Excludesecondarysitenodes.Ingeneral,donotgivevotestonodesthatresideatasecondary

    disasterrecoverysite.Youdonotwantnodesinthesecondarysitetocontributetoadecisiontotaketheclusterofflinewhenthereisnothingwrongwiththeprimarysite.

    5. Oddnumberofvotes.Ifnecessary,addawitnessfileshare,awitnessnode(withorwithoutaSQLServerinstance),orawitnessdisktotheclusterandadjustthequorummodetopreventpossibletiesinthequorumvote.

    6. Reassessvoteassignmentspostfailover.Youdonotwanttofailoverintoaclusterconfigurationthatdoesnotsupportahealthyquorum.

    Formoreinformationonadjustingnodevotes,seeConfigureClusterQuorumNodeWeightSettings(http://msdn.microsoft.com/enus/library/hh270281(SQL.110).aspx).Youcannotadjustthevoteofafilesharewitness.Instead,youmustselectadifferentquorummodetoincludeorexcludeitsvote.Note:SQLServerexposesseveralsystemdynamicmanagementviews(DMVs)thatcanhelpyouadministersettingsrelatedWSFCclusterconfigurationandnodequorumvoting.Formoreinformation,seeMonitorAvailabilityGroups(http://msdn.microsoft.com/enus/library/ff878305(SQL.110).aspx).

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 15

    WSFCDisasterRecoverythroughForcedQuorumQuorumfailureisusuallycausedbyasystemicdisasterorapersistentcommunicationsfailureinvolvingseveralnodesintheWSFCcluster.Rememberthatquorumfailurecausesallclusteredservices,SQLServerinstances,andAvailabilityGroupsintheWSFCclustertobesetoffline,becausetheclustercannotensurenodelevelfaulttolerance.AquorumfailuremeansthathealthyvotingnodesintheWSFCclusternolongersatisfythequorummodel.Somenodesmayhavefailedcompletely,andsomemayhavejustshutdowntheWSFCserviceandareotherwisehealthy,exceptforthelossoftheabilitytocommunicatewithaquorum.TobringtheWSFCclusterbackonline,youmustcorrecttherootcauseofthequorumfailureonatleastonenodeundertheexistingconfiguration.Inadisasterscenario,youmayneedtoreconfigureoridentifyalternativehardwaretouse.YoumayalsowanttoreconfiguretheremainingnodesintheWSFCclustertoreflectthesurvivingclustertopologyaswell.YoucanusetheforcedquorumprocedureonaWSFCclusternodetooverridethesafetycontrolsthattooktheclusteroffline.Thiseffectivelytellstheclustertosuspendthequorumvotingchecks,andletsyoubringtheWSFCclusterresourcesandSQLServerbackonlineonanyofthenodesinthecluster.Thistypeofdisasterrecoveryprocessshouldincludethefollowingsteps:1) Determinethescopeofthefailure.IdentifywhichavailabilitygroupsorSQLServerinstancesare

    nonresponsiveandwhichclusternodesareonlineandavailableforpostdisasteruse,andthenexaminetheWindowseventlogsandtheSQLServersystemlogs.Wherepractical,youshouldpreserveforensicdataandsystemlogsforlateranalysis.

    2) StarttheWSFCclusterbyusingforcedquorumonasinglenode.Onanotherwisehealthynode,manuallyforcetheclustertocomeonlineusingtheforcedquorumprocedure.Tominimizepotentialdataloss,selectanodethatwaslasthostinganavailabilitygroupprimaryreplica.Formoreinformation,seeForceaWSFCClustertoStartWithoutaQuorum(http://msdn.microsoft.com/enus/library/hh270275(v=SQL.110).aspx).Note:Ifyouusetheforcedquorumsetting,quorumchecksareblockedclusterwideuntiltheWSFCclusterachievesamajorityofvotesandautomaticallytransitionstoaregularquorummodeofoperation.

    3) StarttheWSFCservicenormallyoneachotherwisehealthynode,oneatatime.Youdonothavetospecifytheforcedquorumoptionwhenyoustarttheclusterserviceontheothernodes.AstheWSFCserviceoneachnodecomesbackonline,itnegotiateswiththeotherhealthynodestosynchronizethenewclusterconfigurationstate.Remembertodothisonenodeatatimetopreventpotentialraceconditionsinresolvingthelastknownstateofthecluster.Note:Ensurethateachnodethatyoustartcancommunicatewiththeothernewlyonlinenodes,oryouruntheriskofcreatingmorethanonequorumnodeset;thatisasplitbrainscenario.Ifyourfindingsinstep1areaccurate,thisshouldnotoccur.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 16

    4) Applynewquorummodeandnodevoteconfiguration.Ifyousuccessfullyrestartedallnodesintheclusterusingtheforcedquorumprocedure,andifyoucorrectedtherootcauseofthequorumfailure,youdonotneedtomakechangestotheoriginalquorummodeandnodevoteconfiguration.Otherwise,youshouldevaluatethenewlyrecoveredclusternodeandavailabilityreplicatopology,andchangethequorummodeandvoteassignmentsforeachnodeasappropriate.SettheWSFCclusterserviceonunrecoverednodesoffline,orsettheirnodevotestozero.Note:Atthispoint,thenodesandSQLServerinstancesintheclustermayappeartoberestoredbacktoregularoperation.However,ahealthyquorummaystillnotexist.UsingFailoverClusterManager,ortheAlwaysOnDashboardwithinSQLServerManagementStudio,ortheappropriateDMVs,verifythatahealthyquorumhasbeenrestored.

    5) Recoveravailabilitygroupdatabasereplicasasneeded.SomedatabasesmayrecoverandcomebackonlineontheirownaspartoftheregularSQLServerstartupprocess.Therecoveryofotherdatabasesmayrequireadditionalmanualsteps.Youcanminimizepotentialdatalossandrecoverytimefortheavailabilitygroupreplicasbybringingthembackonlineinthissequence,ifpossible:primaryreplica,synchronoussecondaryreplicas,asynchronoussecondaryreplicas.

    6) Repairorreplacefailedcomponentsandrevalidatethecluster.Nowthatyouhaverecoveredfromtheinitialdisasterandquorumfailure,youshouldrepairorreplacethefailednodesandadjustrelatedWSFCandAlwaysOnconfigurationsaccordingly.Thiscanincludedroppingavailabilitygroupreplicas,evictingnodesfromthecluster,orflatteningandreinstallingsoftwareonanode.Note:Youmustrepairorremoveallfailedavailabilityreplicas.SQLServer2012doesnottruncatethetransactionlogpastthelastknownpointofthefarthestbehindavailabilityreplica.Ifafailedreplicaisnotrepairedorremovedfromtheavailabilitygroup,thetransactionlogswillgrowandyouwillruntheriskofrunningoutoftransactionlogspaceontheotherreplicas.

    7) Repeatstep4asneeded.Thegoalistoreestablishtheappropriateleveloffaulttoleranceandhighavailabilityforhealthyoperations.

    8) ConductRPO/RTOanalysis.YoushouldanalyzeSQLServersystemlogs,databasetimestamps,andWindowseventlogstodeterminerootcauseofthefailure,andtodocumentactualRecoveryPointandRecoveryTimeexperiences.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 17

    SQLServerInstanceLevelProtectionThenextlayerofprotectioninanAlwaysOnsolutionisthedataplatformitself;thesearethecapabilitiesandfeaturesofferedbyMicrosoftSQLServer2012anditsintegrationwithWindowsServerinfrastructurecomponents.AvailabilityImprovementsSQLServerInstancesThesearenewSQLServer2012instancelevelfeaturesthatenhanceavailabilityforbothAlwaysOnFailoverClusterInstances,aswellasforstandaloneinstancesthathostAlwaysOnAvailabilityGroups.Theseimprovementsrepresentenhancementsformanagingandtroubleshootingfailoverscenarios: FlexibleFailoverPolicy.Theoutputofthenewsystemstoredprocedureusedforrobustfailure

    detection,sp_server_diagnostics,usestheFailureConditionLevelpropertytoconveytheseverityofafailureaffectingtheSQLServerinstance.AWSFCfailoverpolicygovernshowthisvalueimpactstheSQLServerinstance;rangingfromrelativetoleranceoferrors,tobeingsensitivetoanySQLServerinternalcomponenterror.Youcanconfigurefailovertobetriggeredbyanyoneofarangeoferrorlevels,including:serverdown,serverunresponsive,criticalerror,moderateerror,oranyqualifiederror.TheFailureConditionLevelpropertycanbeusedforFCIoravailabilitygroupfailoverpolicies.PriortoSQLServer2012,therewasnogranularityoferrorconditionstogovernfailover;anyservicelevelfailurecausedfailover.Formoreinformation,seeFailoverPolicyforFailoverClusterInstances(http://msdn.microsoft.com/enus/library/ff878664(SQL.110).aspx).

    Enhancedinstrumentationandlogging.ThereareanumberofAlwaysOnspecificsystemconfigurationviews,DMVs,performancecounters,andanextendedeventhealthsessionthatcapturesanddumpsinformationneededtotroubleshoot,tune,andmonitoryourAlwaysOndeployment.ManyoftheseareexposedvianewSQLServerPolicyManagementfacetsandpolicies.Formoreinformation,seeAlwaysOnAvailabilityGroupsDynamicManagementViewsandFunctions(http://msdn.microsoft.com/enus/library/ff877943(SQL.110).aspx),andsys.dm_os_cluster_nodes(http://msdn.microsoft.com/enus/library/ms187341(SQL.110).aspx).

    SMBfilesharesupport.YoucanplacedatabasefilesonaWindowsServer2008orlaterremotefileshareforbothstandaloneandfailoverclusterinstances,negatingtheneedforaseparatedriveletterperFCI.Thisisagoodoptionforstorageconsolidationorforhostingdatabasefilestorageonaphysicalserverforavirtualmachineguestoperatingsystem.Withtherightconfiguration,I/Operformancecanverynearlyapproximatethatofdirectattachedstorage.Formoreinformation,seeSQLDatabasesonFileSharesIt'stimetoreconsiderthescenario(http://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sqldatabasesonfilesharesitstimetoreconsiderthescenario.aspx).

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 18

    Note:InaWSFCcluster,youcannotaddaSMBfileshareresourcedependencytotheSQLServerresourcegroup;youmusttakeseparatemeasurestoensuretheavailabilityofthefileshare.Ifthefilesharebecomesunavailable,SQLServerthrowsanI/Oexceptionandgoesoffline.

    WSFCinteroperabilitywithDNS.Thevirtualnetworkname(VNN)foranFCIoravailabilitygrouplistenerisregisteredwithDNSonlyduringVNNcreationorduringconfigurationchanges.AllvirtualIPaddresses,regardlessofonlineorofflinestate,areregisteredwithDNSunderthesamevirtualnetworkname.ClientcallstoresolvethevirtualnetworknameinDNSreturnalloftheregisteredIPaddressinavaryingroundrobinsequence.

    AlwaysOnFailoverClusterInstancesTheprimarypurposeofanAlwaysOnSQLServerFailoverClusterInstance(FCI)istoenhanceavailabilityofaSQLServerinstancehostedonlocalserverandstoragehardwarewithinasingledatacenter.AnFCIisasinglelogicalSQLServerinstancethatisinstalledacrossnodesinaWindowsServerFailoverClustering(WSFC)cluster,butonlyactiveononenodeatatime.ClientapplicationsconnecttoavirtualnetworknameandvirtualIPaddressthatareownedbytheactiveclusternode.EachinstallednodehasanidenticalconfigurationandsetofSQLServerbinaries.TheWSFCclusterservicealsoreplicatesrelevantchangesfromtheactiveinstancesentriesintheWindowsregistrytoeachinstallednode.EachnodethattheFCIisinstalledonisdesignatedasapossibleowneroftheinstanceanditsresources,withinapreferredfailoversequence.DatabasefilesarestoredonsharedsymmetricalstoragevolumesareregisteredasaresourcewiththeWSFCcluster,andareownedbythenodethatcurrentlyhoststheFCI.Formoreinformation,seeAlwaysOnFailoverClusterInstances(http://msdn.microsoft.com/enus/library/ms189134(SQL.110).aspx).FCIFailoverProcessIfadependentclusterresourcefails,anAlwaysOnFailoverClusterInstanceinteractswiththeWSFCclusterserviceusingthishighlevelprocesstodoafailover:1) Arestartisindicated.AperiodiccheckoftheWSFCorSQLServerFailoverPolicyconfiguration

    indicatesafailedstate.Bydefault,aservicerestartisattemptedbeforeafailovertoanothernodeisinitiated.Atimeoutintherestartattemptindicatesaresourcefailure.

    2) Afailoverisindicated.AFailoverPolicycheckindicatestheneedforanodefailover.3) TheSQLServerserviceisstopped.Ifcurrentlyrunning,anorderlyshutdownoftheSQLServer

    serviceisattempted.4) TheWSFCclusterresourceistransferred.OwnershipoftheSQLServerclusterresourcegroupand

    itsdependentnetworkandsharedstorageresourcesaretransferredtothenextpreferrednodeowneroftheFCI.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 19

    5) SQLServerisstartedonthenewnode.TheSQLServerinstancegoesthroughitsnormalstartupprocedures.Ifitdoesnotcomebackonlinewithinapendingtimeoutperiod,theclusterserviceputstheresourceonthisnewnodeinafailedstate.

    6) Userdatabasesarerecoveredonthenewnode.Eachuserdatabaseisplacedinrecoverymodewhiletransactionlogredooperationsareappliedanduncommittedtransactionsarerolledback.

    FCIImprovementsPreviousversionsofSQLServerhaveofferedaFCIinstallationoption;however,severalfeatureenhancementsinSQLServer2012improveavailabilityrobustnessandserviceability: Multisubnetclustering.SQLServer2012supportsWSFCclusternodesthatresideinmorethanone

    subnet.AgivenSQLServerinstancethatresidesonaWSFCclusternodecanstartifanynetworkinterfaceisavailable;thisisknownasanORclusterresourcedependency.PriorversionsofSQLServerrequiredthatallnetworkinterfacesbefunctionalfortheSQLServerservicetostartorfailover,andthattheyallexistonthesamesubnetorVLAN.Note:Storagelevelreplicationbetweenclusternodesisnotimplicitlyenabledwithmultisubnetclustering.YourmultisubnetFCIsolutionmustleverageathirdpartySANbasedsolutiontoreplicatedataandcoordinatestoragefailoverbetweenclusternodes.Formoreinformation,seeSQLServer2012AlwaysOn:MultisiteFailoverClusterInstance(http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sqlserver2012alwayson_3a00_multisitefailoverclusterinstance.aspx).

    Robustfailuredetection.TheWSFCclusterservicemaintainsadedicatedadministrativeconnectiontoeachSQLServer2012FCIonthenode.Onthisconnection,aperiodicalcalltoaspecialsystemstoredprocedure,sp_server_diagnostics,returnsaricharrayofsystemhealthdiagnosticinformation.PriortoSQLServer2012,theprimaryhealthdetectionmechanismforaFCIwasimplementedasasimpleonewaypollingprocess.Inthisprocess,theWSFCclusterserviceperiodicallycreatedanewSQLclientconnectiontotheinstance,queriedtheservername,andthendisconnected.Afailuretoconnect,oraquerytimeout,forwhateverreason,triggeredafailoverwithverylittleavailablediagnosticinformation.Formoreinformation,seesql_server_diagnostics(http://msdn.microsoft.com/enus/library/ff878233(SQL.110).aspx).

    ThereisnowbroadersupportforFCIstoragescenarios: Bettermountpointsupport.SQLServersetupnowrecognizesclusterdiskmountpointsettings.The

    specifiedclusterdisksandalldisksmountedtoitareautomaticallyaddedtotheSQLServerresourcedependencyduringsetup.

    tempdbonlocalstorage.FCIsnowsupportplacementoftempdbonlocalnonsharedstorage,suchasalocalsolidstatedrive,potentiallyoffloadingasignificantamountofI/OfromasharedSAN.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 20

    PriortoSQLServer2012,FCIsrequiredtempdbtobelocatedonasymmetricalsharedstoragevolumethatfailedoverwithothersystemdatabases.Note:Thelocationoftempdbisstoredinthemasterdatabase,whichmovesbetweennodesduringfailover.Itmustbeonavalidsymmetricalfilepath(drive,folders,andpermissions)onallpotentialnodeowners,orelsetheSQLServerservicewillnotstartonsomenodes.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 21

    DatabaseAvailabilityThehighavailabilitycapabilitiesofferedbytheinfrastructureandSQLServerinstancelevelcomponentsworktogethertoimplicitlyprotecthosteddatabases.AnAlwaysOnsolutionoffersanadditionalsetofoptionsforexplicitlyprotectingdatabasedataanddatatierapplications.AlwaysOnAvailabilityGroupsAnavailabilitygroupisasetofuserdatabasesthatfailovertogetherfromoneSQLServerinstancetoanotherwithinthesameWSFCcluster.ClientapplicationscanconnecttotheavailabilitygroupsdatabasesthroughaWSFCvirtualnetworkname,knownasanavailabilitygrouplistener,whichabstractstheunderlyingSQLServerinstances.AlwaysOnAvailabilityGroupsrelyuponWindowsServerFailoverClusteringforhealthmonitoring,failovercoordination,andserverconnectivity.YoumustenableAlwaysOnsupportonaSQLServerinstancethatresidesonaWSFCclusternode.However,thatinstancedoesnothavetobeaFCI,anditdoesnotrequiretheuseofsymmetricalsharedstorage.Formoreinformation,seeOverviewofAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/enus/library/ff877884(SQL.110).aspx).AvailabilityReplicasandRolesEachSQLServerinstanceintheavailabilitygrouphostsanavailabilityreplicathatcontainsacopyoftheuserdatabasesintheavailabilitygroup.ASQLServerinstancecanhostonlyoneavailabilityreplicafromagivenavailabilitygroup,butmultipleavailabilitygroupsmayresideonthesameinstance.TheSQLServerinstancemusthavededicated(nonshared)storagevolumes.Oneoftheavailabilityreplicasservesintheroleofprimaryreplica.Itisdesignatedasthemastercopyoftheavailabilitygroupdatabasesandisenabledforread/writeoperations.Anavailabilitygroupcancontainfromonetofouradditionalreadonlyavailabilityreplicasthateachseparatelyserveintheroleofasecondaryreplica.AvailabilityReplicaSynchronizationThecontentsofeachdatabaseinanavailabilitygrouparesynchronizedfromtheprimaryreplicatoeachofsecondaryreplicasthroughamechanismofSQLServerlogbaseddatamovement.Forthisreason,alldatabasesintheavailabilitygroupmustbesettothefullrecoverymodel.Secondaryreplicasareinitializedwithafullbackupandrestoreoftheprimaryreplicasdatabasesandtransactionlogs.Asnewtransactionsarecommittedontheprimaryreplica,thecorrespondingportionofthetransactionlogiscached,queued,andthensentoverthenetworktoadatabasemirroringendpointoneachofthesecondaryreplicanodes.Inthismanner,newentriesintheprimaryreplicatransactionlogareappendedontoeachofthesecondaryreplicastransactionlogs.Eachsecondaryreplicaperiodicallycommunicatesalogsequencenumber(LSN)backtotheprimaryreplicatoindicateawatermarkofhowmuchoftheirtransactionloghasbeenhardenedandflushedtotheremotedisk.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 22

    Note:Eachavailabilityreplicahasitsownsetofindependenttransactionlogredothreadsthatarenotpartoftheavailabilityreplicasynchronizationprocess.Youmayperceivedelaysinthelogredoprocessonthesecondaryreplicasasdatalatency.Inadditiontohavingaroleofprimaryorsecondary,eachavailabilityreplicaalsohasanavailabilitymode,whichgovernsthecoordinationofhardeningthetransactionlogsduringaCOMMITTRANstatement: Synchronouscommitmode.Theprimaryreplicacommitsagiventransactiononlyafterall

    synchronouscommitsecondaryreplicasacknowledgethattheyhavefinishedhardeningtheirrespectivetransactionlogspastthattransactionsLSN.Anavailabilitygroupcanhaveupto2synchronouscommitsecondaryreplicas.Synchronouscommitmodeintroducestransactionlatencyontheprimaryreplicadatabases,butitensuresthatthereisnodatalossonthesecondaryreplicasforcommittedtransactions.

    Asynchronouscommitmode.Theprimaryreplicacommitstransactionsafterhardeningthelocaltransactionlog,butitdoesnotwaitforacknowledgementthatanasynchronouscommitsecondaryreplicahashardeneditstransactionlog.Anavailabilitygroupcanhaveupto4asynchronouscommitsecondaryreplicas,butnomorethanatotalof4secondaryreplicasofanytype.Asynchronouscommitmodeminimizestransactionlatencyontheprimaryreplicadatabasesbutallowsthesecondaryreplicatransactionlogstolagbehind,makingsomedatalosspossible.

    Formoreinformation,seeAvailabilityModes(http://msdn.microsoft.com/enus/library/ff877931(SQL.110).aspx).Theoverallhealthofthedataflowbetweentheavailabilityreplicasisindicatedbythesynchronizationstateofeachreplica.YouwillmostlikelyexperiencedatalossifyoufailovertoasecondaryreplicawithasynchronizationstateofanythingotherthanSynchronizedorSynchronizing.Eachsecondaryreplicassynchronizationstreamhasasessiontimeoutproperty.Whenasecondaryreplicaconfiguredforasynchronouscommitavailabilitymodefailswithasessiontimeout,itistemporarilymarkedinternallyasasynchronous.Thisisdonesothatthesecondaryreplicafailuredoesnotimpacthardeningofthetransactionlogontheprimaryreplica.Afterthatsecondaryreplicaishealthyandcaughtbackupwithprimaryreplica,itautomaticallyrevertstonormalsynchronouscommitmodeoperations.AvailabilityGroupFailoverTheavailabilitygroupandacorrespondingvirtualnetworknameareregisteredasresourcesintheWSFCcluster.Anavailabilitygroupfailsoveratthelevelofanavailabilityreplica,baseduponthehealthandfailoverpolicyoftheprimaryreplica.AnavailabilitygroupfailoverpolicyusestheFailureConditionLevelpropertytoindicatetheseveritytolerancelevelforafailureaffectingtheavailabilitygroup,inconjunctionwiththesp_server_diagnosticssystemstoredprocedure.ThissamemechanismisusedforFCIfailoverpolicies.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 23

    Intheeventofafailover,insteadoftransferringownershipofsharedphysicalresourcestoanothernode,WSFCisleveragedtoreconfigureasecondaryreplicaonanotherSQLServerinstancetotakeovertheroleofprimaryreplica.Theavailabilitygroup'svirtualnetworknameresourceisthentransferredtothatinstance.Allclientconnectionstotheinvolvedavailabilityreplicasarereset.Baseduponthecurrenthealth,synchronizationstate,andavailabilitymodeofthereplicas,eachreplicahasacompositefailoverreadinessstatethatindicatesthepotentialfordataloss.ThisreplicahealthinformationisviewableintheAlwaysOnDashboard,orinthesys.dm_hadr_availability_replica_statessystemview.Eachavailabilityreplicaalsohasaconfiguredfailovermode,whichgovernsreplicabehaviorwhenfailoverisindicated. Automaticfailover(withoutdataloss).ThisallowsforthefastestfailovertimeofanyAlwaysOn

    configurationbecausethesecondaryreplicatransactionlogisalreadyhardenedandsynchronized.Opentransactionsontheprimaryreplicaarerolledback,andtheprimaryreplicaroleistransferredtoasecondaryreplicawithoutanyuserintervention.Theprimaryandsecondaryreplicasmustbesettoautomaticfailovermode,andbothmustbesettosynchronouscommitavailabilitymode.ThesynchronizationstatebetweenthereplicasmustbeSynchronized.Additionally,theWSFCclustermusthaveahealthyquorum.AutomaticfailoverisnotsupportediftheprimaryorsecondaryreplicaresidesonanFCI.ThisisblockedtopreventapotentialraceconditionbetweenavailabilitygroupandFCIfailovers.

    Manualfailover.Thisallowstheadministratortoassessthestateoftheprimaryreplica,andmakeadecisiontodeliberatelyfailovertoasecondaryreplicaornot.Dependingupontheavailabilitymodeandsynchronizationstate,youhavethesechoices:o Plannedmanualfailover(withoutdataloss).Youcanperformthistypeoffailoveronlyifboth

    theprimaryandsecondaryreplicasarehealthyandinaSynchronizedstate.Thisisfunctionallyequivalenttoanautomaticfailover.

    o Forcedmanualfailover(allowingpotentialdataloss).Thisistheonlyformoffailoverthatispossibleifthetargetsecondaryreplicaisinasynchronouscommitavailabilitymode,orifitisnotsynchronizedwiththeprimaryreplica.Warning:Youshouldusethisfailoveroptioninadisasterrecoverysituationonly.Iftheprimaryreplicaishealthyandavailable,youshouldchangetheavailabilitymodeoftheinvolvedreplicastosynchronouscommitandthenperformaplannedmanualfailover.Formoreinformation,seePerformaForcedManualFailoverofanAvailabilityGroup(http://msdn.microsoft.com/enus/library/ff877957(SQL.110).aspx).

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 24

    Youmustperformamanualfailoverifanyofthefollowingconditionsaretrueabouteithertheprimaryreplicaorthesecondaryreplicathatyouwanttofailoverto: Failovermodeissettomanual. Availabilitymodeissettoasynchronouscommit. ReplicaresidesonanFCI.Formoreinformation,seeFailoverModes(AlwaysOnAvailabilityGroups)(http://msdn.microsoft.com/enus/library/hh213151(SQL.110).aspx).Note:Afterafailover,ifthenewprimaryreplicaisnotsettothesynchronouscommitmode,thesecondaryreplicaswillindicateaSuspendedsynchronizationstate.Nodatawillflowtothesecondaryreplicasuntiltheprimaryreplicaissettosynchronouscommitmode.AvailabilityGroupListenerAnavailabilitygrouplistenerisaWSFCvirtualnetworkname(VNN)thatclientscanusetoaccessadatabaseintheavailabilitygroup.TheVNNclusterresourceisownedbytheSQLServerinstanceonwhichtheprimaryreplicaresides.ThevirtualnetworknameisregisteredwithDNSonlyduringavailabilitygrouplistenercreationorduringconfigurationchanges.AllvirtualIPaddressesthataredefinedintheavailabilitygrouplistenerareregisteredwithDNSunderthesamevirtualnetworkname.Tousetheavailabilitygrouplistener,aclientconnectionrequestmustspecifythevirtualnetworknameastheserver,andadatabasenamethatisintheavailabilitygroup.Bydefault,thisshouldresultinaconnectiontotheSQLServerinstancethatishostingtheprimaryreplica.Atruntime,theclientusesitslocalDNSresolvertogetalistofIPaddressesandTCPportsthatmaptothevirtualnetworkname.TheclientthenattemptstoconnecttoeachoftheIPaddresses,untilitissuccessful,oruntilitreachestheconnectiontimeout.TheclientwillattempttomaketheseconnectionsinparalleliftheMultiSubnetFailoverparameterissettotrue,enablingmuchfasterclientfailovers.Intheeventofafailover,clientconnectionsareresetontheserver,ownershipoftheavailabilitygrouplistenermoveswiththeprimaryreplicaroletoanewSQLServerinstance,andtheVNNendpointisboundtothenewinstancesvirtualIPaddressesandTCPports.Formoreinformation,seeClientConnectivityandApplicationFailover(http://msdn.microsoft.com/enus/library/hh213417(SQL.110).aspx).

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 25

    ApplicationIntentFilteringWhileconnectingthroughtheavailabilitygrouplistener,theapplicationcanspecifywhetheritsintentistobothreadandwritedataorwhetheritwillexclusivelyperformreadonlyoperations.Ifnotspecified,thedefaultapplicationintentfortheclientisreadwrite.Fortheprimaryroleandsecondaryroleofeachavailabilityreplica,youcanalsospecifyaconnectionaccesspropertythatwillbeusedasaconnectionlevelfilterontheclientsapplicationintent.Bydefault,invalidapplicationintentandconnectionaccesscombinationsresultinarefusedconnection.SQLServershouldfilteroutclientconnectionrequestsusingthefollowingrules.Whiletheavailabilityreplicaisintheprimaryrole,andconnectionaccessisequalto:

    Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent. Allowonlyexplicitread/writeintent.Ifclientspecifiesreadonly,rejectconnection.

    Whiletheavailabilityreplicaisinthesecondaryrole,andconnectionaccessisequalto: Noconnectionsallowed.Refuseallconnections;replicaisusedonlyfordisasterrecovery. Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent. Readonlyapplicationintent.Ifclientdoesnotspecifyreadonly,rejectconnection.

    Formoreinformation,seeConfigureConnectionAccessonanAvailabilityReplica(http://msdn.microsoft.com/enus/library/hh213002(SQL.110).aspx).ApplicationIntentReadOnlyRoutingAkeyvaluepropositionforAlwaysOnAvailabilityGroupsistheabilitytoleverageyourstandbyhardwareinfrastructureforpurposesotherthandisasterrecovery.Byconfiguringoneormoreofyoursecondaryreplicasforreadonlyaccess,youcanoffloadsignificantworkloadsfromyourprimaryreplicas.Workloadsthatcanbereadilyadaptedtorunoffofareadonlysecondaryreplicainclude:reporting,databasebackups,databaseconsistencychecks,indexfragmentationanalysis,datapipelineextraction,operationalsupport,andadhocqueries.Foreachavailabilityreplica,youcanoptionallyconfigureasequentialreadonlyroutinglistofSQLServerinstanceendpointstobeappliedwhilethatreplicaisintheprimaryrole.Ifpresent,thislistisusedtoredirectclientconnectionrequeststhatspecifyreadonlyapplicationintenttothefirstavailablesecondaryreplicainthelistthatsatisfiestheapplicationintentfiltersnotedearlier.Note:Thereadonlyroutingredirectionisperformedbytheavailabilitygrouplistener,whichisboundtotheprimaryreplica.Iftheprimaryreplicaisoffline,clientredirectionwillnotfunction.Formoreinformation,seeConfigureReadOnlyRoutingonanAvailabilityGroup(SQLServer)(http://msdn.microsoft.com/enus/library/hh653924(SQL.110).aspx)

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 26

    AvailabilityImprovementsDatabasesSQLServer2012hasanumberoffeatureenhancementsthatarespecifictodatabaseconfigurationandcapabilities.Thefollowingimprovementreducesrecoverytime: PredictableRecoveryTime.Youcansetatargetrecoverytimeintervalperdatabase,whichisused

    tocontroltheschedulingofabackgroundCHECKPOINTcommand.Thisindirectcheckpointoccursperiodically,baseduponestimatedtimeneededtorecoverthetransactionlogintheeventofarestartorfailover.ThishastheeffectofsmoothingI/Oouttoroughlyequalproportionsforeachcheckpoint,andincreasingrecoverytime(RTO)predictability.PriortoSQLServer2012,backgroundCHECKPOINTcommandswereissuedonafixedinterval,irrespectiveoftransactionvolumeorload,whichcouldleadtounpredictablerecoverytimes.Formoreinformation,seeDatabaseCheckpoints(http://msdn.microsoft.com/enus/library/ms189573(SQL.110).aspx).

    Theseimprovementsmitigatecommonscenariosthatcandriveplanneddowntime: OnlineindexoperationsforLOBcolumns.Indexesthatcontaincolumnswithvarbinary(max),

    varchar(max),nvarchar(max),orXMLdatatypescannowberebuiltorreorganizedonline. OnlineschemamodificationfornewNOTNULLcolumns.IfanewNOTNULLcolumnisaddedwitha

    defaultvaluetoaSQLServer2012databasetable,onlyaschemalockisrequiredtoupdatesystemmetadata;allrowsdonothavetobepopulatedduringtheALTERTABLEstatement.SQLServerwillphysicallypersistthedefaultcolumnvalueonlyifarowisactuallymodifiedorreindexed.Queriesreturnthedefaultvaluefrommetadata,unlessanactualcolumnvalueexists.

    Thereisanexampleofbroadersupportforstoragescenarios: AutomaticPageRepair.Certaintypesofstoragesubsystemerrorscancorruptadatapage,makingit

    unreadable.AlwaysOnAvailabilityGroupscandetectandautomaticallyrecoverfromthesetypesoferrorsbyasynchronouslyrequestingandapplyingafreshcopyoftheaffecteddatapagesfromadifferentavailabilityreplica.SimilarfunctionalityexistedpriortoSQLServer2012fordatabasemirroring,butitisnowenhancedtosupportmultiplereplicas.Formoreinformation,seeAutomaticPageRepair(http://msdn.microsoft.com/enus/library/bb677167(SQL.110).aspx).

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 27

    ClientConnectivityRecommendationsFollowtheseguidelinestoenableclientapplicationstotakefulladvantageofMicrosoftSQLServer2012AlwaysOntechnologies: AlwaysOnawareclientlibrary.Useaclientlibrarythatsupportsthetabulardatastream(TDS)

    protocolversion7.4ornewer.ThisshouldprovidethedesiredclientsidefunctionalityforAlwaysOnfeatures.ExampleclientlibrariesincludetheDataProviderforSQLServerin.NETFramework4.02,andtheSQLNativeClient11.0.

    Connectionproviderproperty:MultiSubnetFailover=True.UsethiskeywordinyourconnectionstringstoenableclientlibrariestoattempttoconnectinparalleltoallIPaddressesthatareregisteredfortheavailabilitygrouplistenerortheFCIthathasIPaddressinmultiplesubnets.

    Connectionproviderproperty:ApplicationIntent=ReadOnly.Wherepractical,offloadreadonlyworkloadsfromyourprimaryreplicaontothesecondaryreplicas.

    Legacyclientconnectiontimeout.Legacyclientdatabaselibrariesdonotimplementparallelconnectionattempts,sowhenmultipleIPaddressesarepresent,theytrytoconnecttoeachofthemsequentially,untiltheyencounteraTCPtimeout,oruntiltheymakeasuccessfulconnection.YoushouldadjustyourconnectiontimeoutonlegacyclientstoaccommodatethepotentialsequentialtimeoutsandretrieswhenmultipleIPaddressesarepresent,toavaluethatisatleast15seconds+21secondsforeverysecondaryreplica.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 28

    ConclusionThiswhitepaperhasestablishedthebaselinecontextforhowtoreduceplannedandunplanneddowntime,maximizeapplicationavailability,andprovidedataprotectionusingSQLServer2012AlwaysOnhighavailabilityanddisasterrecoverysolutions.Manyofthebusinessdriversandchallengesofplanning,managing,andmeasuringahighlyavailabledatabaseenvironmentcanbequantifiedandexpressedasRecoveryPointObjects(RPO)andRecoveryTimeObjectives(RTO).SQLServer2012AlwaysOnprovidescapabilitiesattheinfrastructure,dataplatform,anddatabaselevelthatcanhelpyourorganizationaddresscommonhighavailabilityanddisasterrecoveryscenarios,inamannerthatcanbewelljustifiedusingRPOandRTOgoals.

    For more information:

    http://www.microsoft.com/sqlserver/: SQL Server Web site

    http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter

    http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter

    Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5 (excellent), how would you rate this paper and why have you given it this rating? For example:

    Are you rating it high due to having good examples, excellent screen shots, clear writing, or another reason?

    Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing? This feedback will help us improve the quality of white papers we release.

    Send feedback.

    Version 1.1, 21 February 2012.

    CoverContentsHigh Availability and Disaster Recovery ConceptsDescribing High AvailabilityPlanned vs. Unplanned DowntimeDegraded Availability

    Quantifying DowntimeRecovery ObjectivesJustifying ROI or Opportunity CostMonitoring Availability HealthPlanning for Disaster Recovery

    Overview: High Availability with Microsoft SQL Server 2012SQL Server AlwaysOnSignificantly Reduce Planned DowntimeEliminate Idle Hardware and Improve Cost Efficiency and PerformanceEasy Deployment and ManagementContrasting RPO and RTO Capabilities

    SQL Server AlwaysOn Layers of ProtectionInfrastructure AvailabilityWindows Operating SystemWindows Server Failover ClusteringWSFC Cluster Validation WizardWSFC Quorum Modes and Voting ConfigurationWSFC Disaster Recovery through Forced Quorum

    SQL Server Instance Level ProtectionAvailability Improvements SQL Server InstancesAlwaysOn Failover Cluster Instances

    Database AvailabilityAlwaysOn Availability GroupsAvailability Group FailoverAvailability Group ListenerAvailability Improvements Databases

    Client Connectivity Recommendations

    Conclusion