assessment of reliability/ dependability –cots components

28
1 Assessment of Reliability/ Dependability –COTS Components Thuy Nguyen and Ray Torok Joint IAEA - EPRI Workshop on Modernization of Instrumentation and Control Systems in NPPs 3 - 6 October, 2006 Vienna, Austria Assessment of Digital Equipment for Safety and High Integrity Applications – Session 4 of 6

Upload: rigg

Post on 14-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Assessment of Digital Equipment for Safety and High Integrity Applications – Session 4 of 6. Assessment of Reliability/ Dependability –COTS Components. Thuy Nguyen and Ray Torok Joint IAEA - EPRI Workshop on Modernization of Instrumentation and Control Systems in NPPs 3 - 6 October, 2006 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Assessment of Reliability/ Dependability –COTS Components

1

Assessment of Reliability/ Dependability –COTS Components

Thuy Nguyen and Ray Torok

Joint IAEA - EPRI Workshop on Modernization of Instrumentation and Control

Systems in NPPs

3 - 6 October, 2006

Vienna, Austria

Assessment of Digital Equipment for Safety and High Integrity Applications – Session 4 of 6

Page 2: Assessment of Reliability/ Dependability –COTS Components

2© 2006 Electric Power Research Institute, Inc. All rights reserved.

Commercial off-the-Shelf (COTS) Components are Attractive

• Many advantages– Proven track record– Lower vendor costs– More available– Opportunity to standardize– Features– …….

• However, for applications critical to safety or power production, want assurance of high quality/dependability

• Problematic for digital equipment, even more so for COTS• Don’t forget – other industries have this problem too• The alternative, developing new equipment from scratch,

is even worse for safety and dependability

Page 3: Assessment of Reliability/ Dependability –COTS Components

3© 2006 Electric Power Research Institute, Inc. All rights reserved.

Review - Digital “Issues”

• New behaviors and failure modes

• Greater complexity

• Human-machine interface

• Software (real-time)– Quality– Limited testability– Common mode failure– Flaws are ‘designed in’

• ‘Like-for-like’ replacement not generally possible

Page 4: Assessment of Reliability/ Dependability –COTS Components

4© 2006 Electric Power Research Institute, Inc. All rights reserved.

Assessment of COTS Components is Problematic

• COT components are usually “evolutionary”– Variable development process

– Rely on expertise of individuals

– Variable documentation - not up to nuclear safety expectations

– Operating history used to detect/fix problems

– Still, the end product can be highly dependable

• Strong development process is considered important for digital

• Vendor cooperation to ‘look inside the box’ to understand design features, defensive measures and failure modes

• Can’t (and don’t want to) force vendors to use nuclear safety standards

• Want to find and credit all evidence of high dependability

Page 5: Assessment of Reliability/ Dependability –COTS Components

5© 2006 Electric Power Research Institute, Inc. All rights reserved.

Experience

CommercialVendor Activities

Nuclear VendorActivities

Experience

Install/Test Install/Test

SupplementalActivities

Utility EvaluationUtility Evaluation

Addt'l Activities

Nuclear GradeDigital Equipment

Commercial GradeDigital Equipment Adequate level

of assurance

Leve

l of A

ssura

nce

Utility

Vendor

Utility

Vendor

drawing3Last Edited 5/3/96

Establishing Assurance Quality / Dependability

Page 6: Assessment of Reliability/ Dependability –COTS Components

6© 2006 Electric Power Research Institute, Inc. All rights reserved.

Tests and Evaluations Do Not Add Quality, They Seek to Confirm its Existence

• Environmental qualification – temperature, humidity, seismic, electromagnetic compatibility, etc.

• Functional & challenge testing

• Review vendor processes & documentation– Software development

– Configuration management

– Corrective actions

– Manufacturing

• Review and credit use of standards, third party certifications as appropriate – TUV, IEC, IEEE, ISO, etc. (with verification)

Page 7: Assessment of Reliability/ Dependability –COTS Components

7© 2006 Electric Power Research Institute, Inc. All rights reserved.

• Operating history assessment (mostly non-nuclear)– Relevance

– Extent

– Success

– Evidence / documentation

• Critical design review– software/hardware architectures– failure modes– abnormal behaviors

• Grade effort based on complexity and safety significance• Base judgment on preponderance of evidence• Want “reasonable assurance” (there are no guarantees)

Tests and Evaluations, cont’d

Page 8: Assessment of Reliability/ Dependability –COTS Components

8© 2006 Electric Power Research Institute, Inc. All rights reserved.

EPRI ‘COTS Guidelines’ for Digital

• EPRI TR-106439, Guideline on Evaluation and Acceptance of Commercial Grade Digital Equipment for Nuclear Safety Applications, October 1996

– Endorsed by NRC in SER, July 1997

• EPRI TR-107339, Evaluating Commercial Digital Equipment for High Integrity Applications - A Supplement to EPRI Report TR-106439,

December 1997 – More detailed, ‘how-to’ guidance

• EPRI – 1011710, Handbook for Evaluating Critical Digital Equipment and Systems, November 2005

– Update based on lessons learned

Page 9: Assessment of Reliability/ Dependability –COTS Components

9© 2006 Electric Power Research Institute, Inc. All rights reserved.

Popular Components for Evaluation

Smart transmitter

Single loop controller

Positioners for air-operated valve

Circuit breaker trip controller

Page 10: Assessment of Reliability/ Dependability –COTS Components

10© 2006 Electric Power Research Institute, Inc. All rights reserved.

General Results of EPRI Component Evaluations

• Evolutionary development

• Experienced development team

• Good manufacturing controls

• Successful operating history

• Software development documentation lacking

• “Continue to run” design philosophy

• Limited diagnostics

• Failed parts of EMC tests

Page 11: Assessment of Reliability/ Dependability –COTS Components

11© 2006 Electric Power Research Institute, Inc. All rights reserved.

Lessons Learned – Selecting Devices and Vendors

• The purchase price is a small fraction of the overall cost for qualification. (Don’t select device based on price)

• Establish acceptable failure modes and abnormal behaviors before selecting candidate devices

• If possible, select simplest device that will do the job

• Costs for qualification will depend on: – To what extent commercial testing and/or certifications can be

credited– What is required to extend device capabilities beyond

commercial specifications (e.g. EMC filter)– Complexity of the device– Extent and relevance of device operating history– Level of involvement and cooperation of device vendor

Page 12: Assessment of Reliability/ Dependability –COTS Components

12© 2006 Electric Power Research Institute, Inc. All rights reserved.

Lessons Learned – Project Planning

• Avoid special application requirements or configurations not in accordance with manufacturer recommendations.

• Establish appropriate level of QA for control of device, testing, and V&V of test equipment.

• Define and budget for mitigation efforts for problems that may be encountered during testing.

• Establish method for maintaining qualification.

Page 13: Assessment of Reliability/ Dependability –COTS Components

13© 2006 Electric Power Research Institute, Inc. All rights reserved.

Lessons Learned – Vendor/Device On-site Review

• Review vendor design and development documents before the visit to streamline and focus the on-site review.

• Assure the review team has appropriate experience and expertise.

• Expect CDR shortcomings and plan for compensation.• Develop a matrix of the critical attributes and methods of

verification prior to the on-site review.

Page 14: Assessment of Reliability/ Dependability –COTS Components

14© 2006 Electric Power Research Institute, Inc. All rights reserved.

Lessons Learned – EMC Qualification

• Investigate and credit (if possible) vendor testing to CE Mark, European EMC Directives, etc.

• Assure test equipment is immune to expected EMI levels for device qualification testing.

• Identify potential device vulnerabilities through informal testing.

• Fully understand test laboratory capabilities and expertise of personnel.

• Plan and budget for fixes as failures are encountered.

Page 15: Assessment of Reliability/ Dependability –COTS Components

15© 2006 Electric Power Research Institute, Inc. All rights reserved.

Evaluation of Programmable Logic Controller (PLC) Platforms

• Apply the same COTS evaluation techniques

• Added complexity increases difficulty

• Vendor should take the lead

• Three platforms have been “pre-qualified” by US regulator

– Siemens Teleperm XS

– Invensys/Triconex Tricon

– Westinghouse Common Q

• Others are considering pre-qualification

Page 16: Assessment of Reliability/ Dependability –COTS Components

16

Inter-Channel / Inter-SystemData Communications

Thuy Nguyen and Ray Torok

Joint IAEA - EPRI Workshop on Modernization of Instrumentation and Control

Systems in NPPs

3 - 6 October, 2006

Vienna, Austria

Assessment of Digital Equipment for Safety and High Integrity Applications – Session 4 of 6

Page 17: Assessment of Reliability/ Dependability –COTS Components

17© 2006 Electric Power Research Institute, Inc. All rights reserved.

Data Communication in Digital I&C Systems

• Advanced digital I&C architectures may feature data communication between:– Redundant divisions of I&C systems important to safety

– I&C systems of different safety classes

• Objective: improve error detection and fault tolerance

• May concern– Digital upgrade of obsolete analog I&C systems

– Digital I&C in new plants

Page 18: Assessment of Reliability/ Dependability –COTS Components

18© 2006 Electric Power Research Institute, Inc. All rights reserved.

IEEE Standard 603-1998

• Standard Criteria for Safety Systems for Nuclear Power Generating Stations

• Independence and physical separation between the redundant channels of a safety system

– The failure of one channel cannot adversely affect the ability of redundant channels to perform the necessary safety functions

• Credible failures in, and consequential actions by, other systems cannot adversely affect the ability of the safety system to perform their intended safety functions

Page 19: Assessment of Reliability/ Dependability –COTS Components

19© 2006 Electric Power Research Institute, Inc. All rights reserved.

Data Communication and Digital Common Cause Failures (CCF)

• Potential for digital CCF due to possible– Failure of data communication links– Uncommon (but correct) modes of data communication

links• These could trigger concurrent digital failures of redundant

divisions or multiple systems

– Error propagation through data communication

• Identification of susceptibilities to digital CCF– Diversity Guideline of BTP-19 and NUREG/CR 6303: 7

forms of diversity Complementary approach based on the analysis of

defensive measures (EPRI D3 technical report TR-1002835)

Page 20: Assessment of Reliability/ Dependability –COTS Components

20© 2006 Electric Power Research Institute, Inc. All rights reserved.

Defensive Measures for Data Communication

• Fault-tolerant overall digital architecture– Single failure criterion– Multiple data communication links

• Defensive measures against CCF of multiple links

– One-way data communication gateways• Reliable data communication links

– Prevention of data communication failures and CCF – Stable data communication conditions

• Communicating stations tolerant to: – Data communication links failures– Transmission of erroneous data

Page 21: Assessment of Reliability/ Dependability –COTS Components

21© 2006 Electric Power Research Institute, Inc. All rights reserved.

Simplified Example of Fault-Tolerant Overall Architecture

Voting & Priority Logic

One-way gateway to lower safety

classes

One-way gateway to lower safety

classes

Page 22: Assessment of Reliability/ Dependability –COTS Components

22© 2006 Electric Power Research Institute, Inc. All rights reserved.

Simplified Example of Fault-Tolerant Overall Architecture - Cont’d

Division A Division B Division C Division D

A

A

A

A

B

B

B

B

C

C

C

C

D

D

D

D

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Page 23: Assessment of Reliability/ Dependability –COTS Components

23© 2006 Electric Power Research Institute, Inc. All rights reserved.

Preventing Data Communication Failure

• Application of rigorous development standards

– Low level of residual faults

• As few internal states as possible

– Facilitates testing and recovery

• Transparency to plant conditions

– Data communication links transparent to transmitted data values

– Stable data communication rates and conditions

• Protection against failures of communicating stations

– Stations failures cannot affect communication links behavior besides acknowledgement and transmission of their availability / unavailability

• Detection & correction or signaling of data transmission errors

Page 24: Assessment of Reliability/ Dependability –COTS Components

24© 2006 Electric Power Research Institute, Inc. All rights reserved.

Preventing Data Communication CCF

• Different applications and operating conditions– Communicating stations, Data

messages, Cycle time, ...

– Influence conditions need to be identified, and differences / similarities need to be assessed

• Same data communication platform– Design measures can be taken to

reduce the likelihood of CCF due to faults in data communication platform• Overall design, Software, Hardware

Page 25: Assessment of Reliability/ Dependability –COTS Components

25© 2006 Electric Power Research Institute, Inc. All rights reserved.

Stable Data Communication Conditions

• Deterministic cyclic functioning of communication links

– Fixed cycle time

– For each cycle, fixed number of messages of fixed length, of fixed semantics, in a fixed order

• Fixed number and identity of communicating stations

– Stations withdrawal and reinsertion do not affect the pre-determined cyclic behavior

– Stations states (availability / unavailability) transmitted at each cycle

• Fixed role for each communicating station

– With respect to each message (send / receive / ignore)

Page 26: Assessment of Reliability/ Dependability –COTS Components

26© 2006 Electric Power Research Institute, Inc. All rights reserved.

Tolerance to Failures of Data Communication Links

• Multiple communication links in diverse operating conditions

– Reflecting overall redundancy, separation and diversity in the I&C architecture

• Identification and characterization of failure modes of communication links

– Detection of communication links failures by stations

– Safety-classified stations can perform their safety functions or reach safe state even when communication links fail

• Protection of stations against communication links failures

– Failures cannot affect stations behavior besides the required actions

Page 27: Assessment of Reliability/ Dependability –COTS Components

27© 2006 Electric Power Research Institute, Inc. All rights reserved.

Tolerance to Transmission of Erroneous Data

• Plausibility checks of data received through communication links

• Erroneous data caused by a single postulated failure received through communication links cannot prevent a safety classified station from performing its safety functions

– May cause safe failures

Page 28: Assessment of Reliability/ Dependability –COTS Components

28© 2006 Electric Power Research Institute, Inc. All rights reserved.

Conclusion

• Appropriate defensive measures can provide reasonable assurance that data communication between redundant channels or safety / non-safety systems will not trigger digital CCF

• Measures to be taken within the data communication subsystems, within safety-classified stations, and at the interface between communication subsystems and stations