maintaining business continuity after internal and external incidents greg schaffer, cissp director...
TRANSCRIPT
Maintaining Business Continuity After Maintaining Business Continuity After Internal and External IncidentsInternal and External Incidents
Greg Schaffer, CISSPGreg Schaffer, CISSP
Director of Network ServicesDirector of Network Services
Middle Tennessee State UniversityMiddle Tennessee State University
Copyright Greg Schaffer 2008. This work isthe intellectual property of the author.
Permission is granted for this material to beshared for non-commercial, educationalpurposes, provided that this copyrightstatement appears on the reproducedmaterials and notice is given that the
copying is by permission of the author. Todisseminate otherwise or to republish
requires written permission from the author.
Our Story Begins Like Many….Our Story Begins Like Many…. It was late in the afternoon one weekday when It was late in the afternoon one weekday when
suddenly alarms sounded in the NOC. It was clear suddenly alarms sounded in the NOC. It was clear SOMETHING had happened, because SOMETHING had happened, because connectivity was shattered across campus. connectivity was shattered across campus. Students could not access online classes, Students could not access online classes, purchase orders could not be processed, email purchase orders could not be processed, email would not go through…would not go through…
BUSINESS BUSINESS DISDISCONTINUITYCONTINUITY
Troubleshooting the ProblemTroubleshooting the Problem It was relatively easy to pinpoint what wasn’t It was relatively easy to pinpoint what wasn’t
talking to what.talking to what. The fact that many things were not talking to The fact that many things were not talking to
other many things indicated that more than one other many things indicated that more than one “thing” was affected.“thing” was affected.
Check of devices indicated the problem was Check of devices indicated the problem was not equipment but at physical layer.not equipment but at physical layer.
It was clear that this was going to take SOME It was clear that this was going to take SOME TIME to fix!TIME to fix!
Location, Location, LocationLocation, Location, Location The relative location of the physical layer The relative location of the physical layer
issue was determined to be at or on the site issue was determined to be at or on the site of new stadium construction.of new stadium construction.
However, there was no initial indications of However, there was no initial indications of anything wrong.anything wrong.
When asked, the construction workers said When asked, the construction workers said they had not been digging…they had not been digging…
BUTBUT ……neglected to mention they had been pile neglected to mention they had been pile
driving rocks to prepare a trench for a new driving rocks to prepare a trench for a new water line.water line.
The concrete encased conduits were The concrete encased conduits were damaged by the equipment.damaged by the equipment.
The area was excavated to reveal what we The area was excavated to reveal what we hoped was minimal damage…hoped was minimal damage…
Getting Services UpGetting Services Up While the extent of the physical damage wasn’t While the extent of the physical damage wasn’t
clear until complete excavation was done the clear until complete excavation was done the next morning it was clear that there was next morning it was clear that there was enough physical damage to assume that the enough physical damage to assume that the conduits would not be usable for replacement conduits would not be usable for replacement fiber optics.fiber optics.
There were redundant fiber cables between There were redundant fiber cables between data centers that took different routes across data centers that took different routes across campus…campus…
Forming the PlanForming the Plan ……except for one portion, which happened to be except for one portion, which happened to be
the pulverized area! the pulverized area! A plan was needed to restore A plan was needed to restore
communications…fastcommunications…fast The plan: The plan:
– access manholes on either end of the damage and access manholes on either end of the damage and splice new fibers in manholessplice new fibers in manholes
– run fibers temporarily on the road, and close the run fibers temporarily on the road, and close the road to all traffic (planned anyway)road to all traffic (planned anyway)
But Almost Down Again!But Almost Down Again! Graduation was that SaturdayGraduation was that Saturday Road opened for visitorsRoad opened for visitors Temporary fibers had vehicles driving over Temporary fibers had vehicles driving over
them most of the day!them most of the day! Fibers held, but needless to say they would Fibers held, but needless to say they would
not be reused…not be reused…
Post MortemPost Mortem Eventually (nearly one month later) a manhole was Eventually (nearly one month later) a manhole was
constructed around the break, and new fibers constructed around the break, and new fibers pulled through the repaired area and splicedpulled through the repaired area and spliced
Despite “normal” controls (“Tennessee One Call”, Despite “normal” controls (“Tennessee One Call”, conduits encased in concrete, redundant fibers, conduits encased in concrete, redundant fibers, etc.) “Bad Stuff” happenedetc.) “Bad Stuff” happened
Bad Stuff = Good LessonsBad Stuff = Good Lessons
Operations Security ControlsOperations Security Controls PreventativePreventative DetectiveDetective CorrectiveCorrective DirectiveDirective RecoveryRecovery DeterrentDeterrent CompensatingCompensating
CISSP CBKCISSP CBK
Preventive/DetectivePreventive/Detective Failed:Failed:
– Tennessee One Call (dirt covered markings)Tennessee One Call (dirt covered markings)– Hardened Physical PathsHardened Physical Paths
Worked (but after the fact)Worked (but after the fact)– Network monitoringNetwork monitoring– Help desk reportingHelp desk reporting– DocumentationDocumentation
Corrective/DirectiveCorrective/Directive WorkedWorked
– Emergency Web CommunicationsEmergency Web Communications– Temporary fiber construction (temporary corrective Temporary fiber construction (temporary corrective
control for Business/Mission Continuity)control for Business/Mission Continuity)– ShovelShovel
FailedFailed– Blocking car and truck trafficBlocking car and truck traffic
RecoveryRecovery More of a longer term approach to prevent the More of a longer term approach to prevent the
same occurrencesame occurrence Redundant fiber between data centersRedundant fiber between data centers Must also consider separate building Must also consider separate building
entrancesentrances Cost of solution vs cost of downtime analysisCost of solution vs cost of downtime analysis
Deterrent/CompensatingDeterrent/Compensating Worked:Worked:
– Penalty/InsurancePenalty/Insurance– Temporary fiber runTemporary fiber run– Cutting of ductsCutting of ducts– Creation of new manholeCreation of new manhole
FinallyFinally It ended up being a late night, hampered by many It ended up being a late night, hampered by many
events. Our DR/BC plan did not specifically events. Our DR/BC plan did not specifically address this problem...NOR SHOULD IT HAVE. address this problem...NOR SHOULD IT HAVE. A good DR/BC plan is flexible and adaptive. The A good DR/BC plan is flexible and adaptive. The necessary resources were mobilized quickly necessary resources were mobilized quickly based on existing DR/BC plans. What could have based on existing DR/BC plans. What could have been a very large disaster goes down as a been a very large disaster goes down as a downtime that lasted 10 hours. downtime that lasted 10 hours.