risk anal

Upload: saospie

Post on 14-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 RISK ANAL

    1/18

    C. Mokkapati 1

    A PRACTICAL RISK AND SAFETY ASSESSMENT METHODOLOGY FOR SAFETY-

    CRITICAL SYSTEMS

    Chinnarao MokkapatiAnsaldo Signal

    Union Switch & Signal Inc.1000 Technology DrivePittsburgh, PA 15219

    Abstract

    This paper presents a practical methodology for a) assessment of risks associated with the

    intended application of a safety-critical system, and b) verification that the system meets the

    safety design requirements that enable the risks to be kept at acceptable levels throughout its

    lifecycle. The methodology consists of the following steps: 1) Define the system and analyze its

    intended operation to determine all potential hazards; 2) Analyze the risks (potential

    consequences after considering the available procedural, circumstantial and physical risk

    reduction barriers in the intended operation of the system); 3) Determine the tolerable hazard

    rates for the system functions by comparing the remaining risks with industry-accepted tolerable

    levels; 4) Apportion the tolerable hazard rates and corresponding safety integrity levels to

    various subsystems/equipment within the system; and 5) Analyze the design of the subsystems/

    equipment and the system to show that the tolerable hazard rates will not be exceeded, and that

    the required levels of safety integrity (assurance against systematic failures) have been built into

    the system. Suitability of the methodology for railroad signaling systems is shown with the help

    of an example.

    1.0 INTRODUCTION

    When an organization such as a Railway desires to install a new product/system for the purpose

    of improving the efficiency and/or safety of its operations, there must be verifiable proof that the

  • 7/27/2019 RISK ANAL

    2/18

    C. Mokkapati 2

    new product/system does indeed provide the desired improvements. Specific to safety, the

    improvements should come in the form of a reduced level of risk (of accidents/mishaps) relative

    to the current level of risk (if known), or relative to commonly-accepted tolerable risk levels.

    This paper presents an approach that can be used for risk and safety assessment of a safety-

    critical system. This approach, broadly based upon U.S. Military Standard 882C (1), AREMA

    C&S Manual Section 17 (2), IEEE Standard 1483-2000 (3), and the CENELEC Standards

    EN50126 (4), EN50128 (5), and EN50129 (6), has been used by the authors organization for the

    assessment of Automatic Train Control Systems furnished for the Copenhagen Metro and the

    Kuala Lumpur Monorail System. It can be applied in a practical manner for other systems suchas PTC Systems, Train Protection Warning Systems, Train Collision Avoidance Systems, etc.,

    that use newer technologies and architectures for meeting defined risk and safety requirements.

    The concepts of Safety Integrity Levels (SILs) and Tolerable Hazard Rates (THRs) are used in

    this approach. Reference (6) provides a detailed description of the concepts of SILs and THRs.

    Section 2 of this paper presents an overview of the risk and safety analysis methodology.

    Section 3 presents details of risk analysis while Section 4 outlines the system design analysis that

    provides proof that the system meets its safety requirements derived from the risk analysis.

    Section 5 gives an example.

    2.0 OVERVIEW OF RISK ANALYSIS AND SYSTEM DESIGN ANALYSIS

    A methodology, derived from CENELEC Report prR009-004 (7), for risk analysis and system

    design analysis is presented in this section. At the heart of this approach is a well-defined

    interface between the operational environment and the architectural design of the system. From

  • 7/27/2019 RISK ANAL

    3/18

    C. Mokkapati 3

    the safety point of view this interface is defined by a list of hazards and tolerable hazard rates

    associated with the system.

    The general steps of the risk analysis and system design analysis methodology are shown in

    Figure 1 and can be summarized as follows:

    1. Define the system adequately

    2. Identify key operational hazards

    3. Determine the tolerable hazard rate THRfor each hazard by analyzing the consequences

    of the hazards (taking into account the operational parameters)

    4. For each hazard: Anlyze the causes down to a functional level taking into account systemdefinition and architecture

    5. Decide which functions are implemented by which subsystem. Then, for each

    subssytem:

    Collect contributions of each function, which is realised by the subsystem, to all

    hazards

    Calculate overall tolerable hazard rate THRs for the subsystem

    Translate THRs into a safety integrity level SILs for the subsystem using a SIL table

    Determine failure rates for the system elements to meet THRs for the subsystem

    Verify & validate that the THRs and SILs are met.

    This methodology, shown in the flowchart of Figure 1, can be divided into two parts: Risk

    Analysis, consisting of Steps 1-3, and System Design Analysis, consisting of Steps 4-5. Risk

    Analysis deals with the real world of the system operation. System Design Analysis deals with

    the technical solutions for managing the risks.

  • 7/27/2019 RISK ANAL

    4/18

    C. Mokkapati 4

    3.0 DETAILS OF RISK ANALYSIS

    The Risk Analysis steps are shown in Figure 2.

    3.1 System Definition

    The system under investigation must be defined completely. This is typically done in the form

    of following documents:

    System Requirements Specification

    System Architecture Description

    System Design Description Documents

    These documents should give details of the systems

    Functional Requirements

    Type of Operation (e.g., signaling principles)

    Operational Parameters (e.g., train schedules, speeds, density,)

    System Boundaries

    3.2 Hazard Identification

    Through a structured Hazard Identification study (e.g., as described in AREMA C&S Manual

    17.3.5), and based on existing data from the End Users sources, the potential hazards associated

    with the intended operation of the system shall be identified and documented in a Hazard Log.

    The following terminology is used:

    1. An individual i uses the technical system (e.g., a train, a Level Crossing). The usage profile

    is described by the number of uses N i (per year or per hour). For reference, a total exposure

  • 7/27/2019 RISK ANAL

    5/18

    C. Mokkapati 5

    per use Ei (hours)may be defined (i. e. the duration of a train journey or the time needed to

    pass a LC).

    2. While using the technical system the individual is exposed to hazards arising from failure of

    the technical system (or its subsystems etc.). Let there be n hazards associated with the

    technical system. Let each hazard Hj have a hazard rate HRj hazards/hour, j = 1,., n. The

    tolerable value of each HRj is what we are trying to determine through the Risk Analysis

    process. The probability that the individual is exposed to the hazard depends additionally on

    the hazard duration Dj and the exposure time Eij of the individual to the hazards. Thisprobability consists of a sum of the probability that the hazard already exists when the

    individual enters the system (approximately HRj Dj) and the probability that the hazard

    occurs while the individual is exposed (approximately HRj Eij). Note that the exposure to the

    hazard Hj may be shorter than or equal to the total exposure: E ij Ei.

    3.3 Risk Determination

    From each hazard one or several types of accidents may occur. This is described for each hazard

    by the consequence probability Cjk, that accident k occurs. Associated with each type of accident

    Akis a corresponding severity, which from the individual point of view is described as the

    probability of fatality Fikfor the single individual.

    This causality corresponds one to one to the individual risk of fatality by

    IRFi= all hazards Hj

    Ni ( HRj x (Dj + Eij) Cjkx Fik) (1)Accidents Ak

  • 7/27/2019 RISK ANAL

    6/18

    C. Mokkapati 6

    If, as a result the IRF is less than the Tolerable Individual Risk (TIR) usually expressed in

    fatalities per year, then the calculated or estimated hazard rates (HR) are called tolerable hazard

    rates (THR).

    In Formula (1), the individual probability of fatality Fik can be calculated from the severity Sk

    (e.g., number of fatalities) in accident k, out of a population of Nkexposed to accident k

    (concept of collective measure of severity). That is,

    Fik =Sk/ Nk (2)

    Note: Accident k could result in other types of potential losses, namely commercial loss and

    environmental loss. It is possible to quantify these losses (convert them into an equivalent

    number of fatalities) in order to include them in the term Skin Equation (2). A discussion and

    agreement with the User shall be needed in this regard.

    3.4 Risk Tolerability Criteria and THR Determination

    To determine the tolerable level of risk, either the GAMAB, the ALARP, or the MEM principle

    can be used. Reference (8), a report by Dr. Hendrik Schbe, oftheInstitute for Software,

    Electronics, Railroad Technology, TV InterTraffic GmbH,provides a detailed treatment of

    these principles.

    The GAMAB principle requires the risk of the new system to be no higher than that associated

    with the system being replaced. An upper and a lower bound on TIR (fatality rate in fatalities

    per year) can be derived from the ALARP principle. A single value of TIR can be derived from

    the MEM principle.

  • 7/27/2019 RISK ANAL

    7/18

    C. Mokkapati 7

    The IRFi in Formula1)is now equated to the TIR in order to determine the tolerable value of

    each hazard rate HRj. These are denoted THRj.

    4.0 DETAILS OF SYSTEM DESIGN ANALYSIS

    The System Design Analysis Processis shown in Figure 3.

    The Risk Analysis detailed in Section 3.0 results in list of n hazards H1, .., Hn together with

    their tolerable hazard rates THR1,.., THRn respectively.

    Further analysis is then required to arrive at a suitable system architecture for the control of such

    hazards. This is called System Design Analysis, which is essentially a causal analysis of the

    hazards H1, ..,Hn. It consists of the following tasks:

    Define the system functions and architecture (technical solution),

    Analyze the causes leading to each hazard,

    Determine the safety integrity requirements (SIL and hazard rates) for the subsystems,

    Determine the reliability requirements for the equipment

    Causal analysis of hazards constitutes two key phases. In a first phase, each THR is apportioned

    to a functional level (system functions). The hazard rate for a function is then translated to a SIL

    using the SIL table below, taken from (6). The SILs are defined at this functional level for the

    subsystems implementing the functionality.

    Tolerable Hazard Rate THR per

    hour and per function

    Safety Integrity Level

    THR < 10-8 410-8 < THR < 10-7 310-7 < THR < 10-6 210-6 < THR < 10-5 1

  • 7/27/2019 RISK ANAL

    8/18

    C. Mokkapati 8

    A sub-system, i. e. a combination of equipment, may implement a number of Safety-Related

    Functions, each of which could require a different SIL. Where this is the case, the sub-system

    must be designed to meet the highest Safety Integrity Level of those functions.

    In the second phase of the causal analysis, the hazard rates for subsystems are further

    apportioned, leading to failure rates for the equipment, but at this physical or implementation

    level the SIL remains unchanged. Consequently also the software SIL defined in (5) would be

    the same as the subsystem SIL but for the exception described in clause 5.2.3 of (5)

    The apportionment process may be performed by any method which allows a suitable

    representation of the combinational logic, e. g. reliability block diagrams, failure modes &

    effects analyses, fault trees, binary decision diagrams, Markov models etc. In any case,

    particular care must be taken when independence of items is required. While in the first part of

    the Causal Analysis functional independence is required (i. e. the failure of functions shall be

    independent with respect to systematic and random faults), physical independence is sufficient in

    the second part (i. e., the failure of subsystems shall be independent with respect to random

    faults). Assumptions made in the causal analysis must be checked and may lead to safety-

    relevant application rules for the implementation.

    System design analysis is essentially a combination of various qualitative and quantitative hazard

    analyses and safety verification & validation steps. A disciplined approach to system design

  • 7/27/2019 RISK ANAL

    9/18

    C. Mokkapati 9

    analysis using a structured Safety Assurance Program (e.g., as outlined in AREMA C&S Manual

    Part 17.3.1) is recommended.

    5.0 EXAMPLE

    A hypothetical Train Protection Warning System (TPWS) shown in Figure 4 is used as an

    example for detailing the steps involved in the Risk Analysis. The Safety Analysis portion is not

    covered in detail for this hypothetical system.

    The desired functions of the TPWS are a) Provide Emergency Brake application to prevent

    Signals Passed at Danger(SPADs), and b) Provide driver warning and speed supervision with

    ability to stop the train if overspeed condition is ignored by the driver. This system is intended

    to be used on a Railroad with heavy passenger train traffic, and the goal is to reduce the risk of

    fatalities due to SPADs to a tolerable level. The following steps are as outlined in Section 3.

    The quantitative numbers used in the example calculations are the authors assumed data and are

    not reflective of any particular Railroads statistics.

    HAZARD H1: TPWS fails to prevent a SPAD that could result in a collision and ensuing

    fatalities.

    RISK ANALYSIS

    1. Determine Risk Tolerability

    A reasonably practical scheme shall be implemented with the aim of ensuring that train collisions

    due to SPADs pose a risk of fatality no higher than 1 in 1,000,000 per year. That is,

  • 7/27/2019 RISK ANAL

    10/18

    C. Mokkapati 10

    Tolerable Individual Risk (TIR) 10-6 per year (Risk of SPAD-caused fatality to the train

    driver, also assumed to be the same for a passenger if the train involved in the event is a

    passenger-carrying train)

    2. Determine Risk Exposure

    Ni = Number of times/year train i passes signals = 10,000

    D1 = Duration of Hazard H1 = 10 hours (A pessimistic estimate)

    Hazard H1 exists when the TPWS has a wrong-side (hazardous) failure that remains non-negatedor un-repaired.

    Hazard H1 has a hazard rate of HR1 failures/hour.The goal is to determine this HR1 before the design of the TPWS can proceed.

    Ei1 = Exposure time of the train to Hazard H1 (time taken by the train to pass a signal at a failedTPWS location. Very short, relative to D1. Ignored)

    3. Cause-Consequence Analysis

    Done in the form of an Event Tree Analysis (ETA), as shown in Figure 5.

    4. Loss Analysis

    From the ETA, two types of accidents and their probabilities of occurrence are determined and

    listed below. For the sake of simplicity, assume the probabilities of fatality in each accident as

    shown below.

    No.(k)

    Accident (Ak) Probability ofOccurrence (C1k)

    Probability of Fatality(Fik)

    1 High Speed Collision 0.00005 0.92 Low Speed Collision 0.00001 0.5

    5. Determine THR

    Substitute the above values in Equation (1):

    IRFi = Ni {HR1x (D1+Ei1) (C1kxFik)}

  • 7/27/2019 RISK ANAL

    11/18

    C. Mokkapati 11

    = 10,000 x HR1x 10 x (0.00005x0.9 +0.00001x0.5) TIR = 10-6

    This results in HR1 = 2x10-7 failures/hour, which is now called THR1

    SYSTEM DESIGN ANALYSIS

    Apportion THR1 to individual pieces of equipment in the TPWS by using Failure Modes and

    Effects Analysis (FMEA) and Fault Tree Analysis (FTA) techniques. Guidance given in

    AREMA C&S Manual Parts 17.3.3 (2) and IEEE Std 1483 (3) can be used. Make sure physical,

    functional and process dependencies within the TPWS equipment are properly handled with the

    use of AND gates in the FTA. An iterative approach is needed to arrive at a cost-effectivedesign.

    Different parts of the TPWS equipment may end up being designed to different SILs for

    systematic failure integrity.

    6. CONCLUSIONS

    A practical methodology for risk and safety analysis using the concepts of tolerable risk, safety integrity

    levels, and tolerable hazard rates is presented in this paper with the help of a simple example. This

    methodology can be applied to signaling and train control systems that use new technologies and

    architectures, and is expected to provide a cost-effective approach to both design and assessment of such

    systems.

    7. REFERENCES

    (1) United States Department of Defense (January 19, 1993) Military Standard: MIL-STD-

    882C -System Safety Program Requirements.

  • 7/27/2019 RISK ANAL

    12/18

    C. Mokkapati 12

    (2) AREMA Communications & Signal Manual, Section 17: Quality Principles. Parts 17.3.1

    (2004), 17.3.3 (2004), and 17.3.5(2004).

    (3) IEEE Standard 1483-2000: Verification of Vital Functions for Processor-Based Systems

    Used in Signal and Train Control.

    (4) CENELEC Standard EN 50126: Railway Applications - The Specification and

    Demonstration of Dependability, Reliability, Availability, Maintainability and Safety

    (RAMS). Issue: March 2000.

    (5) CENELEC Standard EN 50128: Railway Applications- Communications, signaling and

    processing systems - Software for railway control and protection systems. Issue: March2001

    (6) CENELEC Standard EN 50129: Railway Applications- Communications, signaling and

    processing systems - Safety related electronic systems for signaling. Issue: May 2002

    (7) CENELEC Report prR009-004: Railway Applications Systematic Allocation of Safety

    Integrity Requirements (March 1999).

    (8) Different Approaches For Determination Of Tolerable Hazard Rates,byDr. Hendrik

    Schbe, Institute for Software, Electronics, Railroad Technology, TV InterTraffic

    GmbH, 51105 Kln.

  • 7/27/2019 RISK ANAL

    13/18

    C. Mokkapati

    List of Figures in the Paper A Practical Risk and Safety Assessment Methodology for

    Safety-Critical Systems

    Figure 1. Risk and Safety Analysis Overview (From Reference (4))

    Figure 2: Process Details of Risk Analysis (From Reference (4))

    Figure 3. System Design Analysis Summary (From Reference (4))

    Figure 4. A Simple Train Protection Warning System

    Figure 5. Cause-Consequence Analysis (Determination of External Risk Reduction)

  • 7/27/2019 RISK ANAL

    14/18

    C. Mokkapati

    1Define System (functions,

    boundary, interfaces,

    environment,.)

    2Identify (system) hazards

    3Analyze consequences of

    hazards

    4Analyze causes of hazards.

    Identify additional hazards

    5Allocate Safety Integrity

    Requirements to

    subsystems/equipment

    System

    definition

    Hazard Log

    System

    Requirements

    Specification

    Hazard

    Analysis

    Subsystem

    Requirements

    Specification

    Risk tolerability

    criteria (Safety)

    (Sub-) System

    Architecture

    top level

    hazards

    Input Activity Output

    Iterate until

    system element

    level

    THRs

    Risk

    SILs,

    Failure

    Rates

    Figure 1. Risk and Safety Analysis Overview (From Reference (4))

    Risk

    Analysis

    System

    Design

    Analysis

  • 7/27/2019 RISK ANAL

    15/18

    C. Mokkapati

    SystemDefinition

    Analyze

    Operation

    Identify

    Hazards

    Estimate

    Hazard Rates

    IdentifyConsequences:

    Accidents

    Near Misses

    Safe State

    Determine Risk

    Determine THR

    System

    Design

    Analysis

    SystemRequirements

    Specification(Safety

    Requirements)

    Hazard Log

    Risk

    Tolerability

    Criteria (Safety)

    Figure 2: Process Details of Risk Analysis (From Reference (4))

  • 7/27/2019 RISK ANAL

    16/18

    C. Mokkapati

    Hazards H1, .., Hnand their tolerable

    hazard rates

    Use FMEAs, FTAs, Reliability

    Block Diagrams, Binary Decision

    Diagrams, Markov models, etc. asappropriate

    System

    Architecture

    For each AND:

    Common Cause

    Failure Analysis

    Fault detectionmechanism and

    time

    Safety-related

    application

    conditions

    SIL Table

    1. Collect contributions to

    hazards2. Determine THR and

    SIL

    Apportion failure rates

    to elementsSIL and THR

    for elements

    SIL and THR

    for subsystems

    For Each Hazard

    For Each Subsystem

    Figure 3. System Design Analysis Summary(From Reference (4))

    Conduct Verification &

    Validation of SILs and

    THRs

  • 7/27/2019 RISK ANAL

    17/18

    C. Mokkapati

    1. Onboard Computer (OBC)2. Transponder Transmission Module3. Transponder Antenna4. Drivers Console5. Tachometer

    6. Emergency Brake Interface7. Signal Control Logic8. Lineside Electronic Unit9. Transponder

    BASIC FUNCTIONALITY DESIRED:

    Provide driver warning then Emergency Brake Application to prevent Signal Passed atDanger.

    Provide driver warning and speed supervision with ability to stop train if overspeedcondition is ignored by the driver

    7

    8

    9321

    4

    6

    5

    Figure 4. A Simple Train Protection Warning System

  • 7/27/2019 RISK ANAL

    18/18

    C. Mokkapati

    Engineer does

    not noticeobstruction ,

    plows ahead

    No

    No

    No

    Yes

    0.2

    Yes

    0.5

    Yes

    0.001

    No

    Yes

    0.1 Engineer

    notices

    obstruction,

    starts braking,but cant stop

    short of

    obstruction

    Train

    approaches

    a Signal at

    Danger

    H1

    Engineer

    passes

    Signal at

    Danger

    High Speed

    Collision

    0.00005

    Low Speed

    Collision

    0.00001

    Safe State

    0.99994

    Figure 5. Cause Consequence Analysis (Determination of External Risk Reduction)