maintaining business continuity after internal and external incidents john duff, ph.d. copyright...

Post on 24-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Maintaining Business Continuity After Internal and External

IncidentsJohn Duff, Ph.D.Copyright John Duff 2008. This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.

2004 Got Our Attention…….

We are here

The class of 2008 was evacuated 4 times during their freshman Year

We are six feet above sea level

Actually lower than......

Old plan was….

Keep teaching until the water is chest deep…..

This is How We Responded - New plan….

PREPARE

RECOVER

CONTINUE

Keep Everyone Safe

Preserve the Enterprise

The ITS plan was developed in this context

Katrina Effect on top of 2004

The effect of Katrina was to cause us to focus on Business Continuity

How can we survive as an

enterprise if our campus is

damaged substantially?

Question & Challenge

Can we transition from a Residential College to

a distributed, virtual college?

What can ITS do to help make this happen?

Components of Strategy

Leadership

Emergency Management Group

Executive Emergency Management Team

Local Emergency Management groups

Equipped with:

Satellite Phones

Aircards

Components of Strategy

Students

Emergency shelters identified

Transportation provided

Faculty

Evacuate

Severe weather syllabus required & posted online

Staff

Follow advice of local emergency officials

ITS Challenge

Maintain full functionality – anywhere, anytime access:

• Web services • Payment methods• Remote access to critical business and academic processes• Email – existing email and means to stay in contact during and after an event• Support Delivery of Academic Program & Library Services

Become a virtual organization

Identified Requirements

Where did we need Hot Fail Over?

• WWW• Intranet – myEckerd• WebCT• CGI• Webmail• ECWeb• LDAP• Sendmail – manual DNS entry change

Fail over managed by:

Cisco CSS global and local load balancing

Identified Requirements

What additional services can be made available on short notice?

• Administrative Software• Banner• Touchnet – payment gateway• Library database access

• Academic resources• Wiki• Online course materials• Schedules• Rosters

To Deliver Services Requires Co-location

Selected Peak-10 in Tampa

Factors influencing the decision -Cost of the pipe (50mb Metro

Ethernet) -Proximity

-Elevation - 30’ vs 6’

Co-location Servers

Assigned multiple roles to servers to reduce cost

• Database Sun Fire V240

• Banner• Aims• Oracle• mysql• postgresql

• Mail Sun Fire V210

• DNS• SMTP• POP/IMAP• LDAP• IMP Webmail

• Web Services Sun Fire V210

• Payment/Windows Dell PowerEdge 700

Additional Hardware

Network

Firewall

Switch

VPN

Router

Cisco CSS

Console Access

WTi 16-port Serial Switch

Three 9-pin/RJ45 null-modem adapters

External Modem

Tape Backup attached to Backup Express

Storage System

• Sun StorEdge 6130s replaced single-host RAIDs—

before this project, no storage consolidation

• Critical systems (our student information system, billing system, Web and distance learning—and e-mail, according to our Board of Trustees) now consolidated on SSE 6130

Storage Environment• Two identical configurations in St. Pete and Tampa

• Single Cisco fiber switch with FCIP gateway

• Sun StorEdge 6130: single tray, 2 TB of storage

• Dual fiber connections to hosts

• Old-school backups: remote ufsdump server or array FS snapshots

Replication Strategies• Application-based, operating system–based, or

array-based?• Pluses and minuses to each approach; in the end

we chose a mix• Replication of sendmail server was our biggest

question mark• Oracle Data Guard for Oracle on our student

information system• Mix of array replication and other tools (rsync, etc.)

for Web services

When do We Execute?

Established a five day pre-event timelineDay 1

5-day CONE

Day 2 Day 3

3-day CONE

Day 4 Day 5

EVENT

Power down

Staff and Students evacuate

campus

24 hour window for staff to evacuate the area

Schedule full backups on key

Servers, switch

Library remote dbs to colo site

Power down campus servers

Activate colo servers

Post-event timeline

D Day +1 D-Day +2 D-Day +3 D-Day +4 D-Day +5

Damage assessment

Staff Returns to campus

Begin power up on Campus

Power up campus servers

Re-sync begins

Novell, Library return to normal

Up to 4 days

Developing a Culture of Testing

ITSTabletop exercise

Live tests

Scripts, power down procedures, etc.

Now at <30 minutes to bring up co-location site

Campus-wide test of Unit PlansAn average of 40 users test annually

Blue Sky assignments – test VPN access & query database

Pre/Post meetings

Other Considerations

Mailroom

Stop

Locate

Re-route

Timing

Phone Service

How do we operate without the switch?

Lessons Learned

• More cultural than technical at this point• Cost is always an issue – how can we best leverage

co-located site and other resources?• Knowledge transfer and sharing is critical –

technology is great - single point of failure is an individual

• There is never a good time to test – build a schedule and stay with it

THANK YOU

John Duff, Ph.D. - Acting Director of ITS

duffja@eckerd.edu

727-864-8318

Walter Moore – Senior Systems Administrator

moorewr@eckerd.edu

727-864-8318

top related