sharepoint 2013 dr solution overview
DESCRIPTION
SharePoint 2013 DR solution: An overview of a workable solution for mid-size Enterprises An example of implementation and DR Documentation content Outline: - Business Requirements - Recovery Time Objective (RTO) and Recovery Point - Objective (RPO) - Prerequisites - Activation Scenarios - Schedule of events (workflows) - Logical System overview - Escalation matrix - DR procedures - Health checks - DR validation exercise - Event Summary and logsTRANSCRIPT
1
SHAREPOINT 2013 DR SOLUTION
(WARM STAND-BY)
An overview of a workable solution
for mid-size Enterprises
An example of implementation and
DR Documentation contentEmilio Gratton – ICT Project ManagerEG IT Services
OUTLINE
Business Requirements
Recovery Time Objective (RTO) and Recovery Point Objective (RPO)
Prerequisites
Activation Scenarios
Schedule of events (workflows)
Logical System overview
Escalation matrix
DR procedures
Health checks
DR validation exercise
Event Summary and logs
2
BUSINESS REQUIREMENTS
100% availability (24/7) of personal files
Same downtime availability as the hosting Data Centre
30 minutes service restoration over DR DC in R/W mode in
case of major Data Centre planned or unplanned outage
30 minutes restoration over DR in Read/only mode in case of
planned farm outage (service pack release case)
3
RECOVERY TIME OBJECTIVE (RTO) AND RECOVERY POINT OBJECTIVE (RPO)
The RTO is the agreed time duration between a failure and the
restoration of service. In this solution, it has been defined as 30
minutes.
The RPO has been defined as a minimum service with the
following: Web Front End server restored (SharePoint 2013 main page accessible
with links operational)
Search service restored (a query successfully displays results related to internal documentation)
Personal page displaying all links and documents
4
PREREQUISITES 1/2
Data Centres connected with fast and reliable dedicated link
Host are virtualized to ensure host HA
Windows servers are Load Balanced , SQL Servers are clustered
Infrastructure patch level consistent across DCs (SCCM to monitor
and report)
DR farm is kept updated as per latest updates applied to the
Production farm
Customized code and solutions are kept updated in both Farms
(blogs.msdn.com/..../managing-custom-solutions-for-disaster-recovery-sharepoint-farms )
5
PREREQUISITES 2/2
Local DR DBs maintenance managed with the following ploys: SharePoint Admin to manually maintain a local copy of Configuration
and Administrative DBs on DR Farm
This include all DBs required at the DR farm but that are not supported by SQL Server AlwaysOn Availability Group with asynchronous-commit for disaster recovery.
A full list of the supported high availability and disaster recovery options for SharePoint 2013 databases is located here: http://technet.microsoft.com/high availability and disaster recovery options for each SharePoint 2013 system and service application database
6
ACTIVATION SCENARIOS
Four main cases: Data Centre Outage
SharePoint farm incident
SharePoint farm planned outage
SharePoint farm standard maintenance
For each case define: Rationale
DR feature
Actions (associated workflow)
Escalation points
7
SCHEDULE OF EVENTS (WORKFLOWS)
Any case scenario need an associated workflow
Each workflow contains at least 3 stages: Workflow activations and initial controls and notifications
Remediation steps
Final controls and notifications
Tasks are tailored to the Enterprise IT Operations’ procedures
and teams
8
DATABASEMICROSOFT SQL SERVER
DATABASEMICROSOFT SQL SERVER
Replica (Auto-Failover) Replica (Async)
Primary Data-Centre
F5 LTM Load Balancing
DNS
APPLICATION SERVERS
WEB FRONT ENDOFFICE WEB APPS SERVERS
Internal Users
F5 LTM Load Balancing
APPLICATION SERVERS
WEB FRONT ENDOFFICE WEB APPS SERVERS
Disaster recovery Data-Centre
Config DBs
Admin DBs
LOGICAL SYSTEM OVERVIEW
9
LOGICAL SYSTEM OVERVIEW - COMMENTS
The previous slide is a simplified overview of the three
servers’ tiers: Web
Application
Database
The DNS servers point only to Production farm
DR farm is not operational but servers are up and running
DR DBs receive logs only when transactions are completed
(Asyncronous replica)
Config and Admin DBs are locally maintained on DR farm
10
ESCALATION MATRIX
Escalation matrix has to be defined according with
Enterprise incident procedures.
This should include escalation points outside the
organization (vendors or Microsoft)
11
DR PROCEDURES
Use this section of the manual to detail all tasks
contained into the workflows:
Communications
Network tasks
DNS/Server tasks
DB tasks
12
HEALTH CHECKS
In this section the SharePoint team declares what are
the checks that are performed to confirm that the
service is restored onto the other farm
13
DR VALIDATION EXERCISE
Once the SharePoint farms are configured, run a DR
exercise to validate the workflows and the
associated tasks
For each exercise arrange a specific Event Summary
Log file that contains:
Overview of RTO and RPO under validation
Tested scenarios
Detailed event log for each test (see following slide)
14
EVENT SUMMARY AND LOGS
For each test record: Participants (roles and names)
Schedule of events: Activity progress (in minutes)
Real activity progress as recorded
Task Category
Role performing the activity
Action Required
Comments/issues/notes
If you have Lync or WebEx or other chat group solution you can create a conversation with all participants, record all events and save the conversation for review or training purposes
15
ABOUT THE AUTHOR
Emilio Gratton
15+ years’ IT Infrastructure Project Management
PRINCE2 Registered Practitioner
Several experience of SharePoint infrastructure
and solutions delivery
Email: [email protected]
16