maintaining business continuity after internal and external incidents john duff, ph.d. copyright...

25
Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. ff 2008. This work is the intellectual property of the author. Permission is granted for this material to be shared for n ses, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is b inate otherwise or to republish requires written permission from the author.

Upload: robyn-lawson

Post on 24-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Maintaining Business Continuity After Internal and External

IncidentsJohn Duff, Ph.D.Copyright John Duff 2008. This work is the intellectual property of the author. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.

Page 2: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

2004 Got Our Attention…….

We are here

The class of 2008 was evacuated 4 times during their freshman Year

Page 3: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

We are six feet above sea level

Page 4: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Actually lower than......

Page 5: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Old plan was….

Keep teaching until the water is chest deep…..

Page 6: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

This is How We Responded - New plan….

PREPARE

RECOVER

CONTINUE

Keep Everyone Safe

Preserve the Enterprise

The ITS plan was developed in this context

Page 7: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Katrina Effect on top of 2004

The effect of Katrina was to cause us to focus on Business Continuity

How can we survive as an

enterprise if our campus is

damaged substantially?

Page 8: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Question & Challenge

Can we transition from a Residential College to

a distributed, virtual college?

What can ITS do to help make this happen?

Page 9: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Components of Strategy

Leadership

Emergency Management Group

Executive Emergency Management Team

Local Emergency Management groups

Equipped with:

Satellite Phones

Aircards

Page 10: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Components of Strategy

Students

Emergency shelters identified

Transportation provided

Faculty

Evacuate

Severe weather syllabus required & posted online

Staff

Follow advice of local emergency officials

Page 11: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

ITS Challenge

Maintain full functionality – anywhere, anytime access:

• Web services • Payment methods• Remote access to critical business and academic processes• Email – existing email and means to stay in contact during and after an event• Support Delivery of Academic Program & Library Services

Become a virtual organization

Page 12: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Identified Requirements

Where did we need Hot Fail Over?

• WWW• Intranet – myEckerd• WebCT• CGI• Webmail• ECWeb• LDAP• Sendmail – manual DNS entry change

Fail over managed by:

Cisco CSS global and local load balancing

Page 13: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Identified Requirements

What additional services can be made available on short notice?

• Administrative Software• Banner• Touchnet – payment gateway• Library database access

• Academic resources• Wiki• Online course materials• Schedules• Rosters

Page 14: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

To Deliver Services Requires Co-location

Selected Peak-10 in Tampa

Factors influencing the decision -Cost of the pipe (50mb Metro

Ethernet) -Proximity

-Elevation - 30’ vs 6’

Page 15: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Co-location Servers

Assigned multiple roles to servers to reduce cost

• Database Sun Fire V240

• Banner• Aims• Oracle• mysql• postgresql

• Mail Sun Fire V210

• DNS• SMTP• POP/IMAP• LDAP• IMP Webmail

• Web Services Sun Fire V210

• Payment/Windows Dell PowerEdge 700

Page 16: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Additional Hardware

Network

Firewall

Switch

VPN

Router

Cisco CSS

Console Access

WTi 16-port Serial Switch

Three 9-pin/RJ45 null-modem adapters

External Modem

Tape Backup attached to Backup Express

Page 17: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Storage System

• Sun StorEdge 6130s replaced single-host RAIDs—

before this project, no storage consolidation

• Critical systems (our student information system, billing system, Web and distance learning—and e-mail, according to our Board of Trustees) now consolidated on SSE 6130

Page 18: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Storage Environment• Two identical configurations in St. Pete and Tampa

• Single Cisco fiber switch with FCIP gateway

• Sun StorEdge 6130: single tray, 2 TB of storage

• Dual fiber connections to hosts

• Old-school backups: remote ufsdump server or array FS snapshots

Page 19: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Replication Strategies• Application-based, operating system–based, or

array-based?• Pluses and minuses to each approach; in the end

we chose a mix• Replication of sendmail server was our biggest

question mark• Oracle Data Guard for Oracle on our student

information system• Mix of array replication and other tools (rsync, etc.)

for Web services

Page 20: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

When do We Execute?

Established a five day pre-event timelineDay 1

5-day CONE

Day 2 Day 3

3-day CONE

Day 4 Day 5

EVENT

Power down

Staff and Students evacuate

campus

24 hour window for staff to evacuate the area

Schedule full backups on key

Servers, switch

Library remote dbs to colo site

Power down campus servers

Activate colo servers

Page 21: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Post-event timeline

D Day +1 D-Day +2 D-Day +3 D-Day +4 D-Day +5

Damage assessment

Staff Returns to campus

Begin power up on Campus

Power up campus servers

Re-sync begins

Novell, Library return to normal

Up to 4 days

Page 22: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Developing a Culture of Testing

ITSTabletop exercise

Live tests

Scripts, power down procedures, etc.

Now at <30 minutes to bring up co-location site

Campus-wide test of Unit PlansAn average of 40 users test annually

Blue Sky assignments – test VPN access & query database

Pre/Post meetings

Page 23: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Other Considerations

Mailroom

Stop

Locate

Re-route

Timing

Phone Service

How do we operate without the switch?

Page 24: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

Lessons Learned

• More cultural than technical at this point• Cost is always an issue – how can we best leverage

co-located site and other resources?• Knowledge transfer and sharing is critical –

technology is great - single point of failure is an individual

• There is never a good time to test – build a schedule and stay with it

Page 25: Maintaining Business Continuity After Internal and External Incidents John Duff, Ph.D. Copyright John Duff 2008. This work is the intellectual property

THANK YOU

John Duff, Ph.D. - Acting Director of ITS

[email protected]

727-864-8318

Walter Moore – Senior Systems Administrator

[email protected]

727-864-8318