risk anal
TRANSCRIPT
-
7/27/2019 RISK ANAL
1/18
C. Mokkapati 1
A PRACTICAL RISK AND SAFETY ASSESSMENT METHODOLOGY FOR SAFETY-
CRITICAL SYSTEMS
Chinnarao MokkapatiAnsaldo Signal
Union Switch & Signal Inc.1000 Technology DrivePittsburgh, PA 15219
Abstract
This paper presents a practical methodology for a) assessment of risks associated with the
intended application of a safety-critical system, and b) verification that the system meets the
safety design requirements that enable the risks to be kept at acceptable levels throughout its
lifecycle. The methodology consists of the following steps: 1) Define the system and analyze its
intended operation to determine all potential hazards; 2) Analyze the risks (potential
consequences after considering the available procedural, circumstantial and physical risk
reduction barriers in the intended operation of the system); 3) Determine the tolerable hazard
rates for the system functions by comparing the remaining risks with industry-accepted tolerable
levels; 4) Apportion the tolerable hazard rates and corresponding safety integrity levels to
various subsystems/equipment within the system; and 5) Analyze the design of the subsystems/
equipment and the system to show that the tolerable hazard rates will not be exceeded, and that
the required levels of safety integrity (assurance against systematic failures) have been built into
the system. Suitability of the methodology for railroad signaling systems is shown with the help
of an example.
1.0 INTRODUCTION
When an organization such as a Railway desires to install a new product/system for the purpose
of improving the efficiency and/or safety of its operations, there must be verifiable proof that the
-
7/27/2019 RISK ANAL
2/18
C. Mokkapati 2
new product/system does indeed provide the desired improvements. Specific to safety, the
improvements should come in the form of a reduced level of risk (of accidents/mishaps) relative
to the current level of risk (if known), or relative to commonly-accepted tolerable risk levels.
This paper presents an approach that can be used for risk and safety assessment of a safety-
critical system. This approach, broadly based upon U.S. Military Standard 882C (1), AREMA
C&S Manual Section 17 (2), IEEE Standard 1483-2000 (3), and the CENELEC Standards
EN50126 (4), EN50128 (5), and EN50129 (6), has been used by the authors organization for the
assessment of Automatic Train Control Systems furnished for the Copenhagen Metro and the
Kuala Lumpur Monorail System. It can be applied in a practical manner for other systems suchas PTC Systems, Train Protection Warning Systems, Train Collision Avoidance Systems, etc.,
that use newer technologies and architectures for meeting defined risk and safety requirements.
The concepts of Safety Integrity Levels (SILs) and Tolerable Hazard Rates (THRs) are used in
this approach. Reference (6) provides a detailed description of the concepts of SILs and THRs.
Section 2 of this paper presents an overview of the risk and safety analysis methodology.
Section 3 presents details of risk analysis while Section 4 outlines the system design analysis that
provides proof that the system meets its safety requirements derived from the risk analysis.
Section 5 gives an example.
2.0 OVERVIEW OF RISK ANALYSIS AND SYSTEM DESIGN ANALYSIS
A methodology, derived from CENELEC Report prR009-004 (7), for risk analysis and system
design analysis is presented in this section. At the heart of this approach is a well-defined
interface between the operational environment and the architectural design of the system. From
-
7/27/2019 RISK ANAL
3/18
C. Mokkapati 3
the safety point of view this interface is defined by a list of hazards and tolerable hazard rates
associated with the system.
The general steps of the risk analysis and system design analysis methodology are shown in
Figure 1 and can be summarized as follows:
1. Define the system adequately
2. Identify key operational hazards
3. Determine the tolerable hazard rate THRfor each hazard by analyzing the consequences
of the hazards (taking into account the operational parameters)
4. For each hazard: Anlyze the causes down to a functional level taking into account systemdefinition and architecture
5. Decide which functions are implemented by which subsystem. Then, for each
subssytem:
Collect contributions of each function, which is realised by the subsystem, to all
hazards
Calculate overall tolerable hazard rate THRs for the subsystem
Translate THRs into a safety integrity level SILs for the subsystem using a SIL table
Determine failure rates for the system elements to meet THRs for the subsystem
Verify & validate that the THRs and SILs are met.
This methodology, shown in the flowchart of Figure 1, can be divided into two parts: Risk
Analysis, consisting of Steps 1-3, and System Design Analysis, consisting of Steps 4-5. Risk
Analysis deals with the real world of the system operation. System Design Analysis deals with
the technical solutions for managing the risks.
-
7/27/2019 RISK ANAL
4/18
C. Mokkapati 4
3.0 DETAILS OF RISK ANALYSIS
The Risk Analysis steps are shown in Figure 2.
3.1 System Definition
The system under investigation must be defined completely. This is typically done in the form
of following documents:
System Requirements Specification
System Architecture Description
System Design Description Documents
These documents should give details of the systems
Functional Requirements
Type of Operation (e.g., signaling principles)
Operational Parameters (e.g., train schedules, speeds, density,)
System Boundaries
3.2 Hazard Identification
Through a structured Hazard Identification study (e.g., as described in AREMA C&S Manual
17.3.5), and based on existing data from the End Users sources, the potential hazards associated
with the intended operation of the system shall be identified and documented in a Hazard Log.
The following terminology is used:
1. An individual i uses the technical system (e.g., a train, a Level Crossing). The usage profile
is described by the number of uses N i (per year or per hour). For reference, a total exposure
-
7/27/2019 RISK ANAL
5/18
C. Mokkapati 5
per use Ei (hours)may be defined (i. e. the duration of a train journey or the time needed to
pass a LC).
2. While using the technical system the individual is exposed to hazards arising from failure of
the technical system (or its subsystems etc.). Let there be n hazards associated with the
technical system. Let each hazard Hj have a hazard rate HRj hazards/hour, j = 1,., n. The
tolerable value of each HRj is what we are trying to determine through the Risk Analysis
process. The probability that the individual is exposed to the hazard depends additionally on
the hazard duration Dj and the exposure time Eij of the individual to the hazards. Thisprobability consists of a sum of the probability that the hazard already exists when the
individual enters the system (approximately HRj Dj) and the probability that the hazard
occurs while the individual is exposed (approximately HRj Eij). Note that the exposure to the
hazard Hj may be shorter than or equal to the total exposure: E ij Ei.
3.3 Risk Determination
From each hazard one or several types of accidents may occur. This is described for each hazard
by the consequence probability Cjk, that accident k occurs. Associated with each type of accident
Akis a corresponding severity, which from the individual point of view is described as the
probability of fatality Fikfor the single individual.
This causality corresponds one to one to the individual risk of fatality by
IRFi= all hazards Hj
Ni ( HRj x (Dj + Eij) Cjkx Fik) (1)Accidents Ak
-
7/27/2019 RISK ANAL
6/18
C. Mokkapati 6
If, as a result the IRF is less than the Tolerable Individual Risk (TIR) usually expressed in
fatalities per year, then the calculated or estimated hazard rates (HR) are called tolerable hazard
rates (THR).
In Formula (1), the individual probability of fatality Fik can be calculated from the severity Sk
(e.g., number of fatalities) in accident k, out of a population of Nkexposed to accident k
(concept of collective measure of severity). That is,
Fik =Sk/ Nk (2)
Note: Accident k could result in other types of potential losses, namely commercial loss and
environmental loss. It is possible to quantify these losses (convert them into an equivalent
number of fatalities) in order to include them in the term Skin Equation (2). A discussion and
agreement with the User shall be needed in this regard.
3.4 Risk Tolerability Criteria and THR Determination
To determine the tolerable level of risk, either the GAMAB, the ALARP, or the MEM principle
can be used. Reference (8), a report by Dr. Hendrik Schbe, oftheInstitute for Software,
Electronics, Railroad Technology, TV InterTraffic GmbH,provides a detailed treatment of
these principles.
The GAMAB principle requires the risk of the new system to be no higher than that associated
with the system being replaced. An upper and a lower bound on TIR (fatality rate in fatalities
per year) can be derived from the ALARP principle. A single value of TIR can be derived from
the MEM principle.
-
7/27/2019 RISK ANAL
7/18
C. Mokkapati 7
The IRFi in Formula1)is now equated to the TIR in order to determine the tolerable value of
each hazard rate HRj. These are denoted THRj.
4.0 DETAILS OF SYSTEM DESIGN ANALYSIS
The System Design Analysis Processis shown in Figure 3.
The Risk Analysis detailed in Section 3.0 results in list of n hazards H1, .., Hn together with
their tolerable hazard rates THR1,.., THRn respectively.
Further analysis is then required to arrive at a suitable system architecture for the control of such
hazards. This is called System Design Analysis, which is essentially a causal analysis of the
hazards H1, ..,Hn. It consists of the following tasks:
Define the system functions and architecture (technical solution),
Analyze the causes leading to each hazard,
Determine the safety integrity requirements (SIL and hazard rates) for the subsystems,
Determine the reliability requirements for the equipment
Causal analysis of hazards constitutes two key phases. In a first phase, each THR is apportioned
to a functional level (system functions). The hazard rate for a function is then translated to a SIL
using the SIL table below, taken from (6). The SILs are defined at this functional level for the
subsystems implementing the functionality.
Tolerable Hazard Rate THR per
hour and per function
Safety Integrity Level
THR < 10-8 410-8 < THR < 10-7 310-7 < THR < 10-6 210-6 < THR < 10-5 1
-
7/27/2019 RISK ANAL
8/18
C. Mokkapati 8
A sub-system, i. e. a combination of equipment, may implement a number of Safety-Related
Functions, each of which could require a different SIL. Where this is the case, the sub-system
must be designed to meet the highest Safety Integrity Level of those functions.
In the second phase of the causal analysis, the hazard rates for subsystems are further
apportioned, leading to failure rates for the equipment, but at this physical or implementation
level the SIL remains unchanged. Consequently also the software SIL defined in (5) would be
the same as the subsystem SIL but for the exception described in clause 5.2.3 of (5)
The apportionment process may be performed by any method which allows a suitable
representation of the combinational logic, e. g. reliability block diagrams, failure modes &
effects analyses, fault trees, binary decision diagrams, Markov models etc. In any case,
particular care must be taken when independence of items is required. While in the first part of
the Causal Analysis functional independence is required (i. e. the failure of functions shall be
independent with respect to systematic and random faults), physical independence is sufficient in
the second part (i. e., the failure of subsystems shall be independent with respect to random
faults). Assumptions made in the causal analysis must be checked and may lead to safety-
relevant application rules for the implementation.
System design analysis is essentially a combination of various qualitative and quantitative hazard
analyses and safety verification & validation steps. A disciplined approach to system design
-
7/27/2019 RISK ANAL
9/18
C. Mokkapati 9
analysis using a structured Safety Assurance Program (e.g., as outlined in AREMA C&S Manual
Part 17.3.1) is recommended.
5.0 EXAMPLE
A hypothetical Train Protection Warning System (TPWS) shown in Figure 4 is used as an
example for detailing the steps involved in the Risk Analysis. The Safety Analysis portion is not
covered in detail for this hypothetical system.
The desired functions of the TPWS are a) Provide Emergency Brake application to prevent
Signals Passed at Danger(SPADs), and b) Provide driver warning and speed supervision with
ability to stop the train if overspeed condition is ignored by the driver. This system is intended
to be used on a Railroad with heavy passenger train traffic, and the goal is to reduce the risk of
fatalities due to SPADs to a tolerable level. The following steps are as outlined in Section 3.
The quantitative numbers used in the example calculations are the authors assumed data and are
not reflective of any particular Railroads statistics.
HAZARD H1: TPWS fails to prevent a SPAD that could result in a collision and ensuing
fatalities.
RISK ANALYSIS
1. Determine Risk Tolerability
A reasonably practical scheme shall be implemented with the aim of ensuring that train collisions
due to SPADs pose a risk of fatality no higher than 1 in 1,000,000 per year. That is,
-
7/27/2019 RISK ANAL
10/18
C. Mokkapati 10
Tolerable Individual Risk (TIR) 10-6 per year (Risk of SPAD-caused fatality to the train
driver, also assumed to be the same for a passenger if the train involved in the event is a
passenger-carrying train)
2. Determine Risk Exposure
Ni = Number of times/year train i passes signals = 10,000
D1 = Duration of Hazard H1 = 10 hours (A pessimistic estimate)
Hazard H1 exists when the TPWS has a wrong-side (hazardous) failure that remains non-negatedor un-repaired.
Hazard H1 has a hazard rate of HR1 failures/hour.The goal is to determine this HR1 before the design of the TPWS can proceed.
Ei1 = Exposure time of the train to Hazard H1 (time taken by the train to pass a signal at a failedTPWS location. Very short, relative to D1. Ignored)
3. Cause-Consequence Analysis
Done in the form of an Event Tree Analysis (ETA), as shown in Figure 5.
4. Loss Analysis
From the ETA, two types of accidents and their probabilities of occurrence are determined and
listed below. For the sake of simplicity, assume the probabilities of fatality in each accident as
shown below.
No.(k)
Accident (Ak) Probability ofOccurrence (C1k)
Probability of Fatality(Fik)
1 High Speed Collision 0.00005 0.92 Low Speed Collision 0.00001 0.5
5. Determine THR
Substitute the above values in Equation (1):
IRFi = Ni {HR1x (D1+Ei1) (C1kxFik)}
-
7/27/2019 RISK ANAL
11/18
C. Mokkapati 11
= 10,000 x HR1x 10 x (0.00005x0.9 +0.00001x0.5) TIR = 10-6
This results in HR1 = 2x10-7 failures/hour, which is now called THR1
SYSTEM DESIGN ANALYSIS
Apportion THR1 to individual pieces of equipment in the TPWS by using Failure Modes and
Effects Analysis (FMEA) and Fault Tree Analysis (FTA) techniques. Guidance given in
AREMA C&S Manual Parts 17.3.3 (2) and IEEE Std 1483 (3) can be used. Make sure physical,
functional and process dependencies within the TPWS equipment are properly handled with the
use of AND gates in the FTA. An iterative approach is needed to arrive at a cost-effectivedesign.
Different parts of the TPWS equipment may end up being designed to different SILs for
systematic failure integrity.
6. CONCLUSIONS
A practical methodology for risk and safety analysis using the concepts of tolerable risk, safety integrity
levels, and tolerable hazard rates is presented in this paper with the help of a simple example. This
methodology can be applied to signaling and train control systems that use new technologies and
architectures, and is expected to provide a cost-effective approach to both design and assessment of such
systems.
7. REFERENCES
(1) United States Department of Defense (January 19, 1993) Military Standard: MIL-STD-
882C -System Safety Program Requirements.
-
7/27/2019 RISK ANAL
12/18
C. Mokkapati 12
(2) AREMA Communications & Signal Manual, Section 17: Quality Principles. Parts 17.3.1
(2004), 17.3.3 (2004), and 17.3.5(2004).
(3) IEEE Standard 1483-2000: Verification of Vital Functions for Processor-Based Systems
Used in Signal and Train Control.
(4) CENELEC Standard EN 50126: Railway Applications - The Specification and
Demonstration of Dependability, Reliability, Availability, Maintainability and Safety
(RAMS). Issue: March 2000.
(5) CENELEC Standard EN 50128: Railway Applications- Communications, signaling and
processing systems - Software for railway control and protection systems. Issue: March2001
(6) CENELEC Standard EN 50129: Railway Applications- Communications, signaling and
processing systems - Safety related electronic systems for signaling. Issue: May 2002
(7) CENELEC Report prR009-004: Railway Applications Systematic Allocation of Safety
Integrity Requirements (March 1999).
(8) Different Approaches For Determination Of Tolerable Hazard Rates,byDr. Hendrik
Schbe, Institute for Software, Electronics, Railroad Technology, TV InterTraffic
GmbH, 51105 Kln.
-
7/27/2019 RISK ANAL
13/18
C. Mokkapati
List of Figures in the Paper A Practical Risk and Safety Assessment Methodology for
Safety-Critical Systems
Figure 1. Risk and Safety Analysis Overview (From Reference (4))
Figure 2: Process Details of Risk Analysis (From Reference (4))
Figure 3. System Design Analysis Summary (From Reference (4))
Figure 4. A Simple Train Protection Warning System
Figure 5. Cause-Consequence Analysis (Determination of External Risk Reduction)
-
7/27/2019 RISK ANAL
14/18
C. Mokkapati
1Define System (functions,
boundary, interfaces,
environment,.)
2Identify (system) hazards
3Analyze consequences of
hazards
4Analyze causes of hazards.
Identify additional hazards
5Allocate Safety Integrity
Requirements to
subsystems/equipment
System
definition
Hazard Log
System
Requirements
Specification
Hazard
Analysis
Subsystem
Requirements
Specification
Risk tolerability
criteria (Safety)
(Sub-) System
Architecture
top level
hazards
Input Activity Output
Iterate until
system element
level
THRs
Risk
SILs,
Failure
Rates
Figure 1. Risk and Safety Analysis Overview (From Reference (4))
Risk
Analysis
System
Design
Analysis
-
7/27/2019 RISK ANAL
15/18
C. Mokkapati
SystemDefinition
Analyze
Operation
Identify
Hazards
Estimate
Hazard Rates
IdentifyConsequences:
Accidents
Near Misses
Safe State
Determine Risk
Determine THR
System
Design
Analysis
SystemRequirements
Specification(Safety
Requirements)
Hazard Log
Risk
Tolerability
Criteria (Safety)
Figure 2: Process Details of Risk Analysis (From Reference (4))
-
7/27/2019 RISK ANAL
16/18
C. Mokkapati
Hazards H1, .., Hnand their tolerable
hazard rates
Use FMEAs, FTAs, Reliability
Block Diagrams, Binary Decision
Diagrams, Markov models, etc. asappropriate
System
Architecture
For each AND:
Common Cause
Failure Analysis
Fault detectionmechanism and
time
Safety-related
application
conditions
SIL Table
1. Collect contributions to
hazards2. Determine THR and
SIL
Apportion failure rates
to elementsSIL and THR
for elements
SIL and THR
for subsystems
For Each Hazard
For Each Subsystem
Figure 3. System Design Analysis Summary(From Reference (4))
Conduct Verification &
Validation of SILs and
THRs
-
7/27/2019 RISK ANAL
17/18
C. Mokkapati
1. Onboard Computer (OBC)2. Transponder Transmission Module3. Transponder Antenna4. Drivers Console5. Tachometer
6. Emergency Brake Interface7. Signal Control Logic8. Lineside Electronic Unit9. Transponder
BASIC FUNCTIONALITY DESIRED:
Provide driver warning then Emergency Brake Application to prevent Signal Passed atDanger.
Provide driver warning and speed supervision with ability to stop train if overspeedcondition is ignored by the driver
7
8
9321
4
6
5
Figure 4. A Simple Train Protection Warning System
-
7/27/2019 RISK ANAL
18/18
C. Mokkapati
Engineer does
not noticeobstruction ,
plows ahead
No
No
No
Yes
0.2
Yes
0.5
Yes
0.001
No
Yes
0.1 Engineer
notices
obstruction,
starts braking,but cant stop
short of
obstruction
Train
approaches
a Signal at
Danger
H1
Engineer
passes
Signal at
Danger
High Speed
Collision
0.00005
Low Speed
Collision
0.00001
Safe State
0.99994
Figure 5. Cause Consequence Analysis (Determination of External Risk Reduction)