sharepoint 2013 dr solution overview

1

SHAREPOINT 2013 DR SOLUTION

(WARM STAND-BY)

An overview of a workable solution

for mid-size Enterprises

An example of implementation and

DR Documentation contentEmilio Gratton – ICT Project ManagerEG IT Services

OUTLINE

Business Requirements

Recovery Time Objective (RTO) and Recovery Point Objective (RPO)

Prerequisites

Activation Scenarios

Schedule of events (workflows)

Logical System overview

Escalation matrix

DR procedures

Health checks

DR validation exercise

Event Summary and logs

2

BUSINESS REQUIREMENTS

100% availability (24/7) of personal files

Same downtime availability as the hosting Data Centre

30 minutes service restoration over DR DC in R/W mode in

case of major Data Centre planned or unplanned outage

30 minutes restoration over DR in Read/only mode in case of

planned farm outage (service pack release case)

3

RECOVERY TIME OBJECTIVE (RTO) AND RECOVERY POINT OBJECTIVE (RPO)

The RTO is the agreed time duration between a failure and the

restoration of service. In this solution, it has been defined as 30

minutes.

The RPO has been defined as a minimum service with the

following: Web Front End server restored (SharePoint 2013 main page accessible

with links operational)

Search service restored (a query successfully displays results related to internal documentation)

Personal page displaying all links and documents

4

PREREQUISITES 1/2

Data Centres connected with fast and reliable dedicated link

Host are virtualized to ensure host HA

Windows servers are Load Balanced , SQL Servers are clustered

Infrastructure patch level consistent across DCs (SCCM to monitor

and report)

DR farm is kept updated as per latest updates applied to the

Production farm

Customized code and solutions are kept updated in both Farms

(blogs.msdn.com/..../managing-custom-solutions-for-disaster-recovery-sharepoint-farms )

5

http://blogs.msdn.com/b/sambetts/archive/2013/10/31/managing-custom-solutions-for-disaster-recovery-sharepoint-farms.aspx

http://blogs.msdn.com/b/sambetts/archive/2013/10/31/managing-custom-solutions-for-disaster-recovery-sharepoint-farms.aspx

PREREQUISITES 2/2

Local DR DBs maintenance managed with the following ploys: SharePoint Admin to manually maintain a local copy of Configuration

and Administrative DBs on DR Farm

This include all DBs required at the DR farm but that are not supported by SQL Server AlwaysOn Availability Group with asynchronous-commit for disaster recovery.

A full list of the supported high availability and disaster recovery options for SharePoint 2013 databases is located here: http://technet.microsoft.com/high availability and disaster recovery options for each SharePoint 2013 system and service application database

6

http://technet.microsoft.com/en-us/library/jj841106(v=office.15).aspx



ACTIVATION SCENARIOS

Four main cases: Data Centre Outage

SharePoint farm incident

SharePoint farm planned outage

SharePoint farm standard maintenance

For each case define: Rationale

DR feature

Actions (associated workflow)

Escalation points

7

SCHEDULE OF EVENTS (WORKFLOWS)

Any case scenario need an associated workflow

Each workflow contains at least 3 stages: Workflow activations and initial controls and notifications

Remediation steps

Final controls and notifications

Tasks are tailored to the Enterprise IT Operations’ procedures

and teams

8

DATABASEMICROSOFT SQL SERVER

DATABASEMICROSOFT SQL SERVER

Replica (Auto-Failover) Replica (Async)

Primary Data-Centre

F5 LTM Load Balancing

DNS

APPLICATION SERVERS

WEB FRONT ENDOFFICE WEB APPS SERVERS

Internal Users

F5 LTM Load Balancing

APPLICATION SERVERS

WEB FRONT ENDOFFICE WEB APPS SERVERS

Disaster recovery Data-Centre

Config DBs

Admin DBs

LOGICAL SYSTEM OVERVIEW

9

LOGICAL SYSTEM OVERVIEW - COMMENTS

The previous slide is a simplified overview of the three

servers’ tiers: Web

Application

Database

The DNS servers point only to Production farm

DR farm is not operational but servers are up and running

DR DBs receive logs only when transactions are completed

(Asyncronous replica)

Config and Admin DBs are locally maintained on DR farm

10

ESCALATION MATRIX

Escalation matrix has to be defined according with

Enterprise incident procedures.

This should include escalation points outside the

organization (vendors or Microsoft)

11

DR PROCEDURES

Use this section of the manual to detail all tasks

contained into the workflows:

Communications

Network tasks

DNS/Server tasks

DB tasks

12

HEALTH CHECKS

In this section the SharePoint team declares what are

the checks that are performed to confirm that the

service is restored onto the other farm

13

DR VALIDATION EXERCISE

Once the SharePoint farms are configured, run a DR

exercise to validate the workflows and the

associated tasks

For each exercise arrange a specific Event Summary

Log file that contains:

Overview of RTO and RPO under validation

Tested scenarios

Detailed event log for each test (see following slide)

14

EVENT SUMMARY AND LOGS

For each test record: Participants (roles and names)

Schedule of events: Activity progress (in minutes)

Real activity progress as recorded

Task Category

Role performing the activity

Action Required

Comments/issues/notes

If you have Lync or WebEx or other chat group solution you can create a conversation with all participants, record all events and save the conversation for review or training purposes

15

ABOUT THE AUTHOR

Emilio Gratton

15+ years’ IT Infrastructure Project Management

PRINCE2 Registered Practitioner

Several experience of SharePoint infrastructure

and solutions delivery

Email: [email protected]

16

mailto:[email protected]

sharepoint 2013 dr solution overview

Technology