risk register - university of sheffield · corporate information and computing services risk...
TRANSCRIPT
Corporate Information and Computing Services
Risk Register
February 2016
CiCS manages the risks to the ICT infrastructure that supports most of the vital functions of the University. Our internal risk management information will have a more
complex structure than the register layout suggested. Each system or service supports various types of vital activity in the University and is at risk from many causes,
each with its own preventive measures. Systems are also heavily interdependent. The recommended Risk Register layout has too few dimensions to express this
complexity and can only provide a simple summary.
Loss or degradation of any ITC system would clearly be categorised as an infrastructure risk, but this could be misleading as learning, financial processes, research etc
could all be affected equally. Instead we have given more than one category, as a number, against most risks. To give them in order of risk exposure would be almost
meaningless and this has not been attempted.
We have not listed planned actions against most risks. Many major actions were identified earlier and are now complete. They are shown as controls in the register.
These are either major investments in equipment and facilities, or ongoing processes within CiCS. Further actions are likely to be identified as the new assessment
process develops.
CiCS has a Business Continuity Plan and a great deal of other information to be used when various types of incidents occur. This has not been mentioned in the Register
as it applies to every risk.
We have not listed opportunities alongside the risks. They are a very different type of issue in this area.
Categories Most CiCS items relate to 1, 4, 6 and 7
1. Estates, Infrastructure, IT, Business Continuity 2. External relationships and partners 3. Financial 4. Learning, Teaching, Student experience 5. Organisational development and strategy 6. Research and innovation 7. Service quality 8. Staffing and Human Resources
Category Description
Inherent
Controls in place
Residual
Further Actions Due date and
person Other L I L I
Risks related to resourcing
1, 4, 6, 7 Underfunding and obsolescence of IT resources.
Lack of investment leads to reliability and security issues, plus accumulation of long term maintenance backlog which reduces flexibility, reduces product quality, increases upgrade times and generates unpredictability. Departments may then take things into their own hands, creating a chaotic set of facilities.
L M Constant attention to changing needs and funding issues, including continuation of capital budget.
Prioritise spending and effort clearly.
Current funding issues affect this risk.
M M .
1,7 Loss of key personnel.
Many systems are understood by only one person, who could leave, become ill, have an accident etc.
Some systems could then be difficult to maintain, with extended downtime, or projects could be delayed.
M M Identify key personnel
Share expertise and train others in key skills.
Document key systems carefully.
Maintain staff retention through health and morale in the workplace, deal with grievances properly.
Standardise systems and processes to reduce reliance on individuals. Outsource specialist skills to competent third parties.
Use SRDS etc effectively.
M L
Risks involving loss of use of software
1 Major supplier goes out of business or taken over by a rival.
Leads to inability to maintain specific items, unreliability or high cost of replacing everything supplied .
M H Use purchasing agreements that include alternate suppliers.
Use of major suppliers – more likely to be taken over if they fail.
Awareness of alternative approaches.
Avoid obvious dependencies.
M M
1, 4 Software licensing changes.
Software supplier could withdraw licenses for vital items, significantly change the price, go out of business, sell rights etc.
Affects costs and ability to provide items widely. Risk prosecution if unaware.
M M Awareness of forthcoming changes and alternatives.
Membership of consortia.
Negotiate good deals.
Check contracts.
Good relations with suppliers.
Pay annual costs when due.
L M
1,4,6,7, Failure of applications software, eg triggered by incompatibility due to changes in related systems or by increased demands.
This could lead to unavailability of particular services and possible extra costs.
M M Keep software maintenance contracts uptodate.
Application of patches and implementation of new versions as they are provided for central systems.
Commitment to maintenance windows so software & hardware for key business systems can be kept supportable
Appropriate testing of software and its effective and robust configuration,
M L
version and change management.
Good documentation of procedures and dependencies.
Risk arising from external constraints
1 Changes to systems required by legislation, or new requirements from funding bodies Possible requirement to stop using existing processes if not planned ahead.
M M Constant attention to and understanding of legislation and good liaison with funding bodies.
Information Security Officer in place.
PCI/DSS compliance being addressed.
M L
1, 2 Outsourced services (including Google Apps for Education, Blackboard Learn, Talis, Planon, and StarRez)
Potential for loss of services and service quality, reputation and security risks, legal issues (data protection, export laws).
M M Careful checking of thirdparty services and systems
Use only major players.
Correct contractual agreements.
Effective Supplier Relationship Management.
Constant monitoring and robust exit plans in place.
L L
Risks related to internal communication and understanding of needs
7 Unrealistic expectations of CiCS services.
Failure to manage expectations and understanding of services could lead to bad feeling, complaints, unrealistic demands and pressure – lowering service quality.
M M Range of communication channels to inform users of services and changes.
Customer Services and Communication team established
Liaison channels established and continue to
L L
develop, especially at faculty level.
Develop effective mechanisms for prioritisation involving all relevant stakeholders.
7,6,1 Failure to provide services that are needed.
If CiCS does not provide IT services that departments need (or want) they will find their own solutions. This would lead to duplication of effort, extra overall cost, poorer quality and reliability and reduced cooperation.
M M As for ‘Unrealistic expectations’ above.
Working in an agile way to more quickly provide solutions
Introducing a business partnership model for better understanding Faculty IT requirements
L L
Risks arising from malicious activity
1,7 Internal fraud or sabotage.
Disgruntled (or simply untrained) staff could do enormous damage to IT systems and data or bring the University into disrepute
L VH Attention to health and morale of staff.
Charter for system Administrators is regularly issued to all relevant staff.
Access rights are allocated carefully for key systems.
Training on security procedures.
Information Security Manager in place.
L M Information Security awareness raising programme across the University includes guidance on security and statement of personal responsibilities.
1,4,6,7, Unauthorised access to computers and data.
H M A Code of Conduct for all computer users, including
M L Information Security awareness raising programme across the
May be malicious, hacking etc. Could lead to serious breach of University security or create a route to ‘hack’ other sites.
security advice, is published and promoted.
Appropriate detection systems in place.
Monitor attempts to gain unauthorised access.
Allocated rights carefully for sensitive systems.
Manage expired accounts properly.
University includes guidance on security and statement of personal responsibilities.
1, 4, 6, 7 Virus or other malware attack, or software vulnerabilities.
Malicious software can damage any IT system, or prevent normal service by sheer volume of extra traffic.
The problem could spread through many computers including to other sites and take days to clear.
Denial of Service Attacks to University or outsourced systems.
VH VH Use highquality virus/malware detection systems at the centre and encourage or enforce use on all users’ computers.
Scan all incoming email using stateoftheart tools.
Apply all patches to keep key software (virusscanning software, operating systems and major applications) up to date on all computers, central and personal.
Automatically block infected machines from using the network.
Restrict the services that can be fully accessed by users.
Encourage use of ‘managed’ computers and CiCS hosting environment..
Issue clear advice to users.
H M
Risks leading to loss of data, or loss of access to data
1 Failure of data backup systems.
This would destroy the security of data making it vulnerable to many other risks. Loss of current data with no recent backup would be a major and very disruptive incident.
M H Maintenance contract on backup systems.
Follow protocols correctly.
Store data safely and in more than two places.
Duplication of main storage systems means separate backup is becoming less critical.
M M Backup strategy is under review.
Work with users to have a realistic backup strategy.
1 Loss of data from major systems or their failure.
Major services would become unavailable for a period, eg Financial or HR systems, student management etc.
Most likely cause would be equipment failure.
H VH Reliable backup systems and procedures in place.
Data is mirrored between two computer centres. A third copy is mirrored to a third location with optionally a different retention time.
Virtual servers allow functions to be moved seamlessly to other equipment.
Some systems duplicated across two machine rooms with some possibility of failover.
Central systems accessible from offsite for ease of maintenance out of office hours.
L M
Risks related to loss of electronic connectivity
1 Loss of internet connection to the University.
Loss of the Joint Academic Network connection could be due to faults, damage, commercial decisions or other external issues that may not be under control of the national academic community. No computerbased communication offsite, including email, would be possible.
H VH Duplication of main connection.
We have 2 independent connections to the JANET network providing our internet connection, but both have these have recently failed at the same time
Temporary web pages available so that the University does not vanish from the web.
Attention to cable routes during excavations.
Appropriate maintenance on all relevant equipment.
H VH Recent losses of our internet connection through either DDOS attacks on Janet or physical damage have led us to open discussions with Janet about a third connection.
1,7 Loss of telephone connection. This would disrupt all normal voice communication, and would be a major problem in an emergency such as a fire.
M H Use reliable provider, with secondary one for emergencies.
Additional connections so not reliant on one point of connection.
Attention to cable routes during excavations.
Universal access to mobile phone system.
L L
1 Damage or failure to network infrastructure.
H H Major links across campus are duplicated.
H L Recent losses of our internet connection through either DDOS attacks on Janet or physical damage have
Any component could be damaged accidentally or maliciously, or simply fail.
All electronic communication including telephones could be lost, most likely to specific areas.
Vital services can operate from one of the two computer rooms.
Network breaks are detected automatically.
Spares for vital components are kept on site.
Network equipment is housed in locked rooms and cupboards across the campus.
Staff are very experienced in repairing routine failures.
led us to open discussions with Janet about a third connection.
Introducing better liaison with EFM.
Risks causing equipment to become unavailable
1 Mains power failure to vital equipment.
Powerdown of key equipment would disable electronic communication and central IT services.
H VH Central equipment is duplicated between two computer rooms.
Backup power systems In place for both computer rooms, battery and generator based for short and longterm power.
Warning systems and contingency procedures in place.
Regular testing of power backup systems.
CiCS has significant experience of this type of event.
H M New UPS and power tree implemented in CC DC transition to new service 90% complete.
Working on Shared Data Centre project with JISC and northern universities
1 Airconditioning failure in one of the two computer rooms.
Leads to overheating of computer room and likely need to shut down at least some equipment. Result is loss or degradation of some services.
M H Redundancy built in to current airconditioning systems.
Vital services can operate from just one computer room, with effort and disruption in some cases.
.
Emergency airconditioning can be arranged at short notice.
Maintenance contract and attention to airconditioning maintenance (?)
M L Airconditioning systems are being reviewed and should soon be replaced to improve efficiency and reliability.
Working on Shared Data Centre project with JISC and northern universities
1 Flood in a computer room. These are lowlying areas with many cable ducts entering below ground level, with a potential to carry water into the buildings. Flooding would disrupt power and other cabling leading to loss of services. There are associated dangers to staff from electricity or electrolytic generation of flammable gas.
L H Mains wiring and sockets moved to above underfloor ‘ground’ level. New overhead power rails installed.
Sump and water detection system in place with pump.
Most ducts are watertight.
Vital services can be run from one computer room.
L M Working on Shared Data Centre project with JISC and northern universities
1 Theft of vital equipment, or associated damage.
Targeted theft of IT equipment has occurred at other Universities in the past.
L H Computer rooms are physically secure, with careful security procedures.
Vital services can operate from just one computer room, with effort and disruption in some cases.
L M
Vital services would be disrupted until replacements installed.
Confidential data could be taken.
Risk to personal safety in a violent theft.
Data is replicated to other locations
Alarms and CCTV in place.
1 Gas leak into computer room – eg carried underground via cable ducts.
Risk of explosion, need to evacuate building. Possible need to switch off vital equipment.
L H Gas detection in place.
Most ducts are sealed.
Vital services can operate from just one computer room, with effort and disruption in some cases.
L L Working on Shared Data Centre project with JISC and northern universities
Risks involving damage to or major failure of equipment
1 Major hardware failure.
Can be caused by a range of events, both accidental and malicious.
Depending on which hardware fails, vital services could be disrupted or communications lost.
M VH Vital services can be run from one of the two computer rooms.
Central equipment is on rapidresponse maintenance contracts and many spares are held onsite
M M
1 Escape of highpressure water or steam from main CHP pipes. These feed many buildings, including the Computing Centre. Pipes tend to be in the same spaces as Network equipment. Pressure is said to be such that structures could be damaged. Communications could be
L H Vital services can be operated from one of the two computer rooms.
Redundancy is built in to the major network links.
CHP systems wellmaintained (?)
L M Working on Shared Data Centre project with JISC and northern universities
disrupted including to Security control centre.
1 Fire in a computer room or other vital space.
This could damage vital equipment and stop computer or communication services.
L VH Fire detection systems in place.
Fire suppressant systems in computer rooms.
Vital services can be run from either of two computer rooms.
L M Working on Shared Data Centre project with JISC and northern universities
1 Lightning strike.
Antennae are most vulnerable, but current surges could destroy any local equipment, leading to loss of any computer or communication services.
L H Lightning conductors in place on computer buildings.
Data connections are now fibre, so surges not conducted to other buildings.
Vital services can be run from either of two computer rooms.
L M
1 Failure of telephone exchange system.
This is essentially another very reliable computer server. Failure would result in loss of all voice communication across the campus and externally.
M M Telephone system is split across two computer rooms and can function from either alone.
Maintenance contract is in place and attention given to quality of installation.
Mobile phones in almost universal use.
M L
Risks relating to unavailability of physical estate
1 Serious damage or loss of use of a Data Centre
L VH Vital services can operate from just one computer room, with effort and disruption in some cases.
L M Working on Shared Data Centre project with JISC and northern universitie
Good maintenance of computer rooms.
Good security, fire detection, burglar alarms etc. in place
Vesda fire prevention system in place.
Third Data Centre is in place.
1 Loss of use of a building or part building.
CiCS could be called on to provide facilities for displaced staff, possibly by releasing one or more student computer rooms.
This would take CiCS staff resource for other tasks and degrade the service to students.
M M Most vital systems are designed to be used by staff working from home and web access is a standard condition for most software procurement.
Student computer rooms have telephone points – one has a large number of them for use as an emergency communications room.
M M
1,4 Loss of significant amounts of teaching space, for example if a building closed at short notice.
There is enormous pressure on the University teaching estate and the student experience could be seriously negatively impacted by loss of teaching space.
L H There are back up locations in place, both external and in non teaching space at the University although this can incur signficant expense.. The timetabling team have been familiarised with the entire University estate and can work closely with EFM colleagues to organise a solution quickly.
L M
NonICT Units University Print Service
1,4 Print Service unable to function for a short period due to loss or unavailability of premises, power, equipment, communication links or key staff. This could delay the availability of vital materials such as exam papers or degree certificates.
L H A business continuity plan is in place, which includes identification of external suppliers.
L M
NonICT Units Performance Spaces
1,2,4,7 Octagon centre unable to provide space for major or minor events, internal or external, due to damage or issues affecting the building, unavailability of staff, electrical failure. This could prevent important events such as exams and degree ceremonies, or external incomegenerating events. And have a major impact on the student experience and the reputation of the University.
L H A contingency plan is in place, which includes access to emergency staffing, generator etc. Alternative venues are identified but availability depends on their own schedules.
L M
1, 2, 4 Drama studio unavailable for use by external groups or internal teaching departments due to fire or other damage, loss of power or unavailability of staff. This could hinder teaching plans, prevent incomegenerating activity and damage our reputation.
L M A contingency plan is in place, which includes recognition that alternative spaces would be unlikely to be available.
Internal (teaching) bookings are discussed and clarified as soon as possible.
L M
Inability to confirm bookings to external groups, due to lack of clarity of internal bookings, could lead to loss of revenue and reputation.