hcl whitepaper: best practices from safety standards for creation of robust and reliable software

Best Practices from Safety Standards for Creation of Robust and Reliable Software

N o v e m b e r 2 0 1 1

Best Practices from Safety Standards for Creation of Robust and Reliable Software | November 2011

© 2011, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved.

TABLE OF CONTENTS

Abstract ............................................................................................. 3

Abbreviations .................................................................................... 4

Market Trends/Challenges ................................................................ 5

Solution ............................................................................................. 6

Conclusion....................................................................................... 10

Reference ........................................................................................ 11

Author Info ....................................................................................... 11



3

Abstract

Mission-critical software has become very reliable and robust by

adhering to high quality safety standards in the development

lifecycle. An example of mission-critical software, also called “safety

critical,” is the software implemented in passenger aircraft, or in

control systems operating nuclear and chemical plants. Safety

critical software includes those applications whose malfunction can

cause multiple deaths. Conventional software applications, i.e.

applications that are not critical and more commonly used, can be

made as robust and reliable as mission-critical software by following

the simple best practices required for mission-critical software with

very little or no additional effort. Though there are mandatory

procedures prescribed by safety standards that can require

significant additional effort, the focus of this paper is to highlight

practices that can be implemented in any project with very little

additional effort. By that process, every software implemented can

achieve a very low probability of failure. Also, it becomes easier for

organizations that develop software, be it software service providers

or OEMs, to easily adhere to an ever-expanding list of safety

standards across domains. In fact, for organizations already

practicing CMMi standards, the migration to robust software

development is not a major upgrade. This paper describes the best

practices that can be adopted in each of the major development

cycle milestones, including requirements, design, coding and

testing. These are considered from a safety standpoint, and are not

intended to describe the software lifecycle.



4

Abbreviations

Sl.

No.

Acronyms

(Page

No.)

Full form

1 CMM Capability Maturity Model, Level 3

2 CRC Cyclic Redundancy Checks

3 OEM Original Equipment Manufacturer

4 RACI Responsible, Approver, Consulted, Informed



5

Market Trends/Challenges

When creating software, it is the goal of any organization to make

the software robust, defect-free and less prone to failure. But, even

with software processes in place, software does crash, much to the

displeasure of end users. Today, organizations release software for

use after adequate testing, when defects are minimal or trivial, and

after defect density is stabilized, Despite that, they cannot provide a

guarantee on the probability of software failure or the possibility of a

defect surfacing during operation. Safety standards focus on these

issues, and attempt to ensure that software is reliable. Almost all

safety standards classify developed software into 3 or 4 levels,

depending of the level of reliability. Note: safety-critical software

must comply with stringent basic requirements, even to be classified

at the lowest levels.

So, by following best practices in conventional software projects, a

high level of reliability can be achieved. The emphasis here is to

absorb the best practices that will not have a major impact on the

effort required to create the software -- developing a strict-

compliance, safety-critical application can be much more expensive

than conventional projects.

The emphasis here is to

absorb the best practices that

will not have a major impact on

the effort required to create the

software



6

Solution

1. Brief Insights into Safety Standards

Safety standards are defined for specific domains:

o IEC 61508 – is the safety standard followed by

industrial automation OEMs to build control systems

managing chemical plant operations

o IEC 61513 – safety standards for equipment

controlling operations in nuclear power plants

o DO 178 – Aeronautical standards governing flight

software

o ISO 26262 – Safety standards for the automotive

industry

Though they are industry specific, the underlying requirement in all

these standards is, by and large, common (will not use the word

„same‟ here, as there are some variations). They recommend, or in

other words are more suited for Waterfall or the V lifecycle

development lifecycle, though there are cases where Agile

practitioners have embraced these standards.

The sections below describe the best practices that can be adapted

from these safety standards into each major phase of the

development lifecycle of any conventional project.

The assumption here is that organizations already follow basic

processes for software development. The erstwhile CMM level 3

can be considered a basic software process.

2. Absorbing the Best Practices

2.1 Project Planning

Project planning for mission-critical projects is very similar to

conventional projects. Other than the usual project management

content, they need to ensure that the project plan includes:

o Clear entry and exit criteria for each phase of a

project, the deliverables on completion of each

phase, the verification plan of phase document and

quality audits that will be performed for each phase

o Skill mapping and RACI chart. Skill Mapping to

describe the skills required (as rows) to implement

the project against the project team members

(columns) and the level of proficiency (Expert,

Practitioner, beginner, NA) to be filled for each cell.

A RACI chart should accompany the skill mapping

table to indicate responsibility levels.

Though there are safety

standards for each industry

vertical, they all share many

requirements in common



7

2.2 Requirements Phase

2.2.1 Requirement classification

Conventional project execution follows a requirement template. It

should have a unique requirement ID, requirement description,

assumptions and priority. It is similar to a safety-critical project;

however, each requirement needs to be classified as:

o Critical requirement

o Critical support requirement

o Non critical requirement

This can also be incorporated for conventional projects with

following intentions:

o Critical Requirement – Those requirements the

software must perform. Can also be called Core

requirements. For example, in a web-based remote

monitoring application, the data analysis will be a

critical requirement

o Critical Support Requirement – Additional

requirements that are not core requirements, but

indirectly contribute to the functioning of core

requirements. In the above example, the ability to

zoom the data to a wider scale or compress it to

see a larger span can be considered as a support

requirement. The application can be still be used for

data analysis without this function, or with the

function disabled. Hence, it is called a support

function.

o Non critical requirement – Features of software that

are not related to the core requirements

So, by this classification, the priority and importance of each of

software modules becomes more explicit. Design and testing can be

based on types of above classifications.

2.2.2 Failure mode event analysis (FEMA)

For mission-critical applications, FEMA on requirements is done to

analyze the effect of functionality failure. For conventional projects,

too, the same can be applied to all critical requirements to identify

any missing requirements. Performing a good FEMA after

completion of requirements gathering usually adds 5-10% new

requirements not previously envisioned, and many times, checks if

defined requirements are flawed or not.

2.2.3 Mandatory usage of software tools to maintain requirements

Tools and Pre existing

software selection need to

justified as they can introduce

defects into product

Graceful degradation can be

effective technique during

resource crunch

Requirements to be classified

based on their criticality

FEMA done on requirements

can throw up new

requirements and determine if

existing requirements are

flawed



8

Though Microsoft Word or Excel may appear sufficient to capture

requirements, usage of specific requirement gathering tools, e.g.

DOORS, will help in maintaining the requirements. Requirements

tend to get deleted or modified, or new requirements added.

Maintaining traceability becomes time consuming without tools, and

increases the possibility of errors.

2.2.4 Forward and Reverse Traceability

Decomposed requirements should have a reference to the Parent

requirement, and more importantly, all requirements IDs should

include a forward traceability to section in high-level design, low-

level design and test cases. It should also have a reverse

traceability to the Parent document (it could be from multiple

sources) from which requirements are gathered

It is mandatory to maintain forward and reverse traceability for each

phase of software development – high-level design, low-level

design, coding, unit test cases, and integration test cases.

2.3 High Level Design

2.3.1 Tools and pre-existing software

It is good practice to select software tools (compilers, static code

analysis or performance test tools) or any pre-existing software

(Real Time operating system or any 3rd

party library) that are

proven. The design document should justify the usage of such tools,

preferably providing previous usage experience to substantiate its

credibility. This is done to ensure the tools and/or pre-existing

software do not introduce any defects into the product.

2.3.2 Design modularity

Design needs to be highly modular. Independent activities must be

grouped into different tasks/processes. The core activities (that

defines the product) and non core activities should run in different

tasks. The function names used in these tasks should be uniquely

named, even if they perform the same work. Avoid or use to bare

minimum global variables

2.3.3 Diagnostics

Provision to diagnose faults is a major activity for safety-critical

applications. For conventional projects, even if not explicitly stated

in requirements, a re-visit on diagnostics and ways to retrieve log

information will make the design comprehensive.

2.3.4 Graceful degradation

The safety standards suggest many techniques to enhance fault

tolerance – the ability to continue to perform normal operations on

occurrence of fault. For conventional projects, that may not be very

important or may be expensive to implement. One useful technique



9

that can be largely applied to many conventional projects is graceful

degradation. At times when the resources are constrained, when

resources could be processing capability or hard disk space or any

other dynamically varying computing parameter that could affect the

quality of software performance, a provision should be made to

switch off non-critical or less critical tasks to allow the core tasks to

be performed. The switched-off task can be reinitiated once the

resources crunch threat lessens or disappears.

2.3.5 CRC check

For embedded applications, use of CRC data checking when data is

transferred across independent functioning modules is

recommended. As there are proven standard CRC libraries

available, incorporating them will not be time-consuming, and at the

same time, can effectively address data corruption.

2.3.6 Abundant use of UML diagrams

Module design should be supported by documenting class

diagrams, activity diagrams, state transition diagrams and data flow

diagrams.

2.3.7 FEMA on Design

A FEMA done on design considerations can bring out the flaws in

design.

Peer review of the design by a competent person outside the project

team needs to be performed, and a design checklist needs to be

prepared.

2.4 Coding

As coding standards are followed by conventional projects, it is

good practice to use a static code analyzer (QAC, fxCop) to check

the code for any violations with respect to coding standards.

2.5 Testing

2.5.1 Negative testing

If it is not possible to test simulate the inputs to test all scenarios -- a

separate code branch in the source code repository should be

maintained, with the negative scenarios seeded as input. However,

this may require significant additional effort, and needs to be

followed only when the usual testing did not identify all possible

defects, even after executing a comprehensive list of test cases.

The same situation applies for code coverage testing. Though

mandatory for safety-critical applications, it can prove to be a

significant drain on effort if not provisioned in the estimation stage.

2.6 Release Notes

Negative Testing can be

significant effort consumer



10

It‟s a good practice to list all unresolved defects, and more

importantly, suggest alternatives when a user experiences the

defect.



11

Conclusion

Safety standards like IEC61508 and DO 178 have proved to be

successful. Products created by complying with these standards

have proved to be reliable with a low probability of failure.

Obviously, all conventional software creators would like their

software to perform as reliably as safety-critical applications without

incurring major additional effort or cost. By following the best

practices from these safety standards, conventional software, too,

can become more predictable and robust with little additional effort.

These best practices raise the bar for software quality.

The best practice raises the

bar for software quality.



12

Reference

1 IEC 61508 -3, Ed 2.0, 2010

Author Info

Mahesh Subramaniam, ERS, Industrial

Practice, has over 20 years of experience on

control systems engineering, software

development and IT project management. His

current interests include safety-critical

applications and condition monitoring.

Hello, I’m from HCL’s Engineering and R&D Services. We enable technology led organizations to go to market with innovative products & solutions. We partner with our customers in building world class products & creating the associated solution delivery ecosystem to help build market leadership. Right now, 14500+ of us are developing engineering products, solutions and platforms across Aerospace and Defense, Automotive, Consumer Electronics, Industrial Manufacturing, Medical Devices, Networking & Telecom, Office Automation, Semiconductor, Servers & Storage for our customers.

For more details contact [email protected]

Follow us on twitter http://twitter.com/hclers and our blog http://ers.hclblogs.com/

Visit our website http://www.hcltech.com/engineering-services/

About HCL

About HCL Technologies HCL Technologies is a leading global IT services company, working with clients in the areas that impact and redefine the core of their businesses. Since its inception into the global landscape after its IPO in 1999, HCL focuses on 'transformational outsourcing,' underlined by innovation and value creation, and offers integrated portfolio of services including software-led IT solutions, remote infrastructure management, engineering and R&D services and BPO. HCL leverages its extensive global offshore infrastructure and network of offices in 26 countries to provide holistic, multi-service delivery in key industry verticals including Financial Services, Manufacturing, Consumer Services, Public Services and Healthcare. HCL takes pride in its philosophy of 'Employee First, Customer Second' which empowers our 77,046 transformers to create a real value for the customers. HCL Technologies, along with its subsidiaries, had consolidated revenues of US$ 3.5 billion (Rs. 16,034 crores), as on 30 June 2011 (on LTM basis). For more information, please visit www.hcltech.com

About HCL Enterprise HCL is a $6 billion leading global technology and IT enterprise comprising two companies listed in India - HCL Technologies and HCL Infosystems. Founded in 1976, HCL is one of India's original IT garage start-ups. A pioneer of modern computing, HCL is a global transformational enterprise today. Its range of offerings includes product engineering, custom & package applications, BPO, IT infrastructure services, IT hardware, systems integration, and distribution of information and communications technology (ICT) products across a wide range of focused industry verticals. The HCL team consists of over 85,000 professionals of diverse nationalities, who operate from 31 countries including over 500 points of presence in India. HCL has partnerships with several leading global 1000 firms, including leading IT and technology firms. For more information, please

visit www.hcl.com

hcl whitepaper: best practices from safety standards for creation of robust and reliable software

Business

safetycritical software

software lifecycle

robust byadhering

robust softwaredevelopment

high quality safety

safety standpoint

reliable softwarenovember

hcl technologies