sharepoint 2013 dr solution overview

16
SHAREPOINT 2013 DR SOLUTION (WARM STAND-BY) An overview of a workable solution for mid-size Enterprises An example of implementation and DR Documentation content Emilio Gratton – ICT Project Manager EG IT Services 1

Upload: emilio-gratton

Post on 17-Nov-2014

144 views

Category:

Technology


1 download

DESCRIPTION

SharePoint 2013 DR solution: An overview of a workable solution for mid-size Enterprises An example of implementation and DR Documentation content Outline: - Business Requirements - Recovery Time Objective (RTO) and Recovery Point - Objective (RPO) - Prerequisites - Activation Scenarios - Schedule of events (workflows) - Logical System overview - Escalation matrix - DR procedures - Health checks - DR validation exercise - Event Summary and logs

TRANSCRIPT

Page 1: SharePoint 2013 DR solution overview

1

SHAREPOINT 2013 DR SOLUTION

(WARM STAND-BY)

An overview of a workable solution

for mid-size Enterprises

An example of implementation and

DR Documentation contentEmilio Gratton – ICT Project ManagerEG IT Services

Page 2: SharePoint 2013 DR solution overview

OUTLINE

Business Requirements

Recovery Time Objective (RTO) and Recovery Point Objective (RPO)

Prerequisites

Activation Scenarios

Schedule of events (workflows)

Logical System overview

Escalation matrix

DR procedures

Health checks

DR validation exercise

Event Summary and logs

2

Page 3: SharePoint 2013 DR solution overview

BUSINESS REQUIREMENTS

100% availability (24/7) of personal files

Same downtime availability as the hosting Data Centre

30 minutes service restoration over DR DC in R/W mode in

case of major Data Centre planned or unplanned outage

30 minutes restoration over DR in Read/only mode in case of

planned farm outage (service pack release case)

3

Page 4: SharePoint 2013 DR solution overview

RECOVERY TIME OBJECTIVE (RTO) AND RECOVERY POINT OBJECTIVE (RPO)

The RTO is the agreed time duration between a failure and the

restoration of service. In this solution, it has been defined as 30

minutes.

The RPO has been defined as a minimum service with the

following: Web Front End server restored (SharePoint 2013 main page accessible

with links operational)

Search service restored (a query successfully displays results related to internal documentation)

Personal page displaying all links and documents

4

Page 5: SharePoint 2013 DR solution overview

PREREQUISITES 1/2

Data Centres connected with fast and reliable dedicated link

Host are virtualized to ensure host HA

Windows servers are Load Balanced , SQL Servers are clustered

Infrastructure patch level consistent across DCs (SCCM to monitor

and report)

DR farm is kept updated as per latest updates applied to the

Production farm

Customized code and solutions are kept updated in both Farms

(blogs.msdn.com/..../managing-custom-solutions-for-disaster-recovery-sharepoint-farms )

5

Page 6: SharePoint 2013 DR solution overview

PREREQUISITES 2/2

Local DR DBs maintenance managed with the following ploys: SharePoint Admin to manually maintain a local copy of Configuration

and Administrative DBs on DR Farm

This include all DBs required at the DR farm but that are not supported by SQL Server AlwaysOn Availability Group with asynchronous-commit for disaster recovery.

A full list of the supported high availability and disaster recovery options for SharePoint 2013 databases is located here: http://technet.microsoft.com/high availability and disaster recovery options for each SharePoint 2013 system and service application database

6

Page 7: SharePoint 2013 DR solution overview

ACTIVATION SCENARIOS

Four main cases: Data Centre Outage

SharePoint farm incident

SharePoint farm planned outage

SharePoint farm standard maintenance

For each case define: Rationale

DR feature

Actions (associated workflow)

Escalation points

7

Page 8: SharePoint 2013 DR solution overview

SCHEDULE OF EVENTS (WORKFLOWS)

Any case scenario need an associated workflow

Each workflow contains at least 3 stages: Workflow activations and initial controls and notifications

Remediation steps

Final controls and notifications

Tasks are tailored to the Enterprise IT Operations’ procedures

and teams

8

Page 9: SharePoint 2013 DR solution overview

DATABASEMICROSOFT SQL SERVER

DATABASEMICROSOFT SQL SERVER

Replica (Auto-Failover) Replica (Async)

Primary Data-Centre

F5 LTM Load Balancing

DNS

APPLICATION SERVERS

WEB FRONT ENDOFFICE WEB APPS SERVERS

Internal Users

F5 LTM Load Balancing

APPLICATION SERVERS

WEB FRONT ENDOFFICE WEB APPS SERVERS

Disaster recovery Data-Centre

Config DBs

Admin DBs

LOGICAL SYSTEM OVERVIEW

9

Page 10: SharePoint 2013 DR solution overview

LOGICAL SYSTEM OVERVIEW - COMMENTS

The previous slide is a simplified overview of the three

servers’ tiers: Web

Application

Database

The DNS servers point only to Production farm

DR farm is not operational but servers are up and running

DR DBs receive logs only when transactions are completed

(Asyncronous replica)

Config and Admin DBs are locally maintained on DR farm

10

Page 11: SharePoint 2013 DR solution overview

ESCALATION MATRIX

Escalation matrix has to be defined according with

Enterprise incident procedures.

This should include escalation points outside the

organization (vendors or Microsoft)

11

Page 12: SharePoint 2013 DR solution overview

DR PROCEDURES

Use this section of the manual to detail all tasks

contained into the workflows:

Communications

Network tasks

DNS/Server tasks

DB tasks

12

Page 13: SharePoint 2013 DR solution overview

HEALTH CHECKS

In this section the SharePoint team declares what are

the checks that are performed to confirm that the

service is restored onto the other farm

13

Page 14: SharePoint 2013 DR solution overview

DR VALIDATION EXERCISE

Once the SharePoint farms are configured, run a DR

exercise to validate the workflows and the

associated tasks

For each exercise arrange a specific Event Summary

Log file that contains:

Overview of RTO and RPO under validation

Tested scenarios

Detailed event log for each test (see following slide)

14

Page 15: SharePoint 2013 DR solution overview

EVENT SUMMARY AND LOGS

For each test record: Participants (roles and names)

Schedule of events: Activity progress (in minutes)

Real activity progress as recorded

Task Category

Role performing the activity

Action Required

Comments/issues/notes

If you have Lync or WebEx or other chat group solution you can create a conversation with all participants, record all events and save the conversation for review or training purposes

15

Page 16: SharePoint 2013 DR solution overview

ABOUT THE AUTHOR

Emilio Gratton

15+ years’ IT Infrastructure Project Management

PRINCE2 Registered Practitioner

Several experience of SharePoint infrastructure

and solutions delivery

Email: [email protected]

16