system engineering and software exception handling › ... · “therefore the identification and...

16
SYSTEM ENGINEERING SYSTEM ENGINEERING AND AND SOFTWARE EXCEPTION SOFTWARE EXCEPTION HANDLING HANDLING Herb Hecht Herb Hecht SoHaR Incorporated SoHaR Incorporated Culver City, California Culver City, California

Upload: others

Post on 31-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

SYSTEM ENGINEERING SYSTEM ENGINEERING AND AND

SOFTWARE EXCEPTION SOFTWARE EXCEPTION HANDLINGHANDLING

Herb HechtHerb HechtSoHaR IncorporatedSoHaR IncorporatedCulver City, CaliforniaCulver City, California

Page 2: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

WHY WE ARE HEREWHY WE ARE HERE1.1. Many failures in critical systems are due to Many failures in critical systems are due to

missing or faulty exception handling missing or faulty exception handling and we want to change thatand we want to change that

2.2. They were not tested under the exception They were not tested under the exception conditionsconditions

3.3. The requirements were not specific about The requirements were not specific about exceptions that had to be toleratedexceptions that had to be tolerated

4.4. Comprehensive specification of exceptions Comprehensive specification of exceptions that have to be tolerated is difficult that have to be tolerated is difficult –– or is it or is it impossible?impossible?

Page 3: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

HOW SOFTWARE FAILSHOW SOFTWARE FAILS

““The main line software code usually does The main line software code usually does its job. Breakdowns typically occur its job. Breakdowns typically occur when when the software exception code does not the software exception code does not properly handle abnormal input or properly handle abnormal input or environmental conditionsenvironmental conditions –– or when an or when an interface does not respond in the interface does not respond in the anticipated or desired manner.anticipated or desired manner.””

C. K. Hansen, C. K. Hansen, The Status of Reliability Engineering Technology 2001The Status of Reliability Engineering Technology 2001, , Newsletter of the IEEE Reliability Society, January 2001Newsletter of the IEEE Reliability Society, January 2001

Page 4: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

SOME SPECTACULARSSOME SPECTACULARS

THERACTHERAC--25 25 FATAL RADIATION OVERDOSESFATAL RADIATION OVERDOSESDID NOT SUPPRESS OPERATOR INPUT WHILE DID NOT SUPPRESS OPERATOR INPUT WHILE MAGNETS WERE REPOSITIONEDMAGNETS WERE REPOSITIONED

ARIANE 5ARIANE 5 CRASHED AFTER LAUNCHCRASHED AFTER LAUNCHDISABLED LANGUAGE PROVIDED EXC. HANDL.DISABLED LANGUAGE PROVIDED EXC. HANDL.PERMITTED SHUTPERMITTED SHUT--DOWN OF BOTH NAV SYST.DOWN OF BOTH NAV SYST.

MARS POLAR LANDER MARS POLAR LANDER HARD LANDEDHARD LANDEDFAILURE TO DEFAILURE TO DE--BOUNCE CONTACTSBOUNCE CONTACTS

Page 5: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

IMPORTANCE OF EXCEPTION IMPORTANCE OF EXCEPTION HANDLING HANDLING -- 11

0.35

0.3

0.2

0.15RECOVERY

PROCESSINGERRORS

HARDWARE

SOFTWARE

Toy, W. N., “Fault-Tolerant Design of AT&T Telephone Switching Systems” in Reliable Computer Systems: design and evaluation, Siewiorek and Swarz, eds., Digital Press, Burlington MA, 1992

Page 6: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

IMPORTANCE OF EXCEPTION IMPORTANCE OF EXCEPTION HANDLING HANDLING -- 22

0.54

0.30

0.160.00

DEFENSE

TELEPHONY

COMMONROUTINES

EXECUTIVE

ALL FAILURES GLOBAL FAILURES

Kanoun, K. and T. Sabourin, “Software Dependability of a Telephone Switching System”, Digest of Papers, FTCS-17, Pittsburgh PA, July 1987, pp. 236 – 241

0.3

0.29

0.26

0.15

DEFENSE

TELEPHONY

COMMONROUTINESEXECUTIVE

Page 7: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

EXCEPTION HANDLING AND EXCEPTION HANDLING AND CRITICALITYCRITICALITY

Fraction EH

00.10.20.30.40.50.60.70.80.9

1

Safety

Crit,

Mission

Crit.

Major

Interm

ediate

Minor

Hecht, H. and P. Crane, “Rare Conditions and their Effect on Software Failures”, Proc. of the 1994 Annual Reliability and Maintainability Symposium”, January 1994, pp. 334 – 337.

Page 8: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

RELEVANT QUOTESRELEVANT QUOTES““The main line software code usually does its job. Breakdowns typThe main line software code usually does its job. Breakdowns typically ically

occur when the software exception code does not properly handle occur when the software exception code does not properly handle abnormal input or environmental conditions abnormal input or environmental conditions –– or when an interface or when an interface does not respond in the anticipated or desired manner.does not respond in the anticipated or desired manner.””

C. K. Hansen, C. K. Hansen, The Status of Reliability Engineering Technology 2001The Status of Reliability Engineering Technology 2001, Newsletter of the IEEE , Newsletter of the IEEE Reliability Society, January 2001Reliability Society, January 2001

““Therefore the identification and handling of the exceptional sitTherefore the identification and handling of the exceptional situations uations that might occur is often just as (that might occur is often just as (un)reliableun)reliable as human intuition.as human intuition.””

FlaviuFlaviu CristianCristian ““Exception Handling and Tolerance of Software FaultsException Handling and Tolerance of Software Faults”” in in Software Fault Tolerance,Software Fault Tolerance,Michael R. Michael R. LyuLyu, ed., Wiley, New York, 1995, ed., Wiley, New York, 1995

Page 9: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

WHY THESE FAILURES?WHY THESE FAILURES?

THE PROGRAMS WERE NOT TESTED THE PROGRAMS WERE NOT TESTED UNDER THE CONDITIONS THAT UNDER THE CONDITIONS THAT CAUSED THE FAILURESCAUSED THE FAILURESTHERE WERE NO REQUIREMENTS THERE WERE NO REQUIREMENTS FOR TESTING UNDER THESE FOR TESTING UNDER THESE CONDITIONSCONDITIONSGENERATING REQUIREMENTS FOR GENERATING REQUIREMENTS FOR EXCEPTION HANDLING IS EXCEPTION HANDLING IS DIFFICULTDIFFICULT

Page 10: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

WHY THE DIFFICULTY?WHY THE DIFFICULTY?

EXCEPTION CONDITIONS ARISE EXCEPTION CONDITIONS ARISE FROM SEVERAL LEVELSFROM SEVERAL LEVELSEXCEPTION CONDITIONS ARE MORE EXCEPTION CONDITIONS ARE MORE DIFFICULT TO UNDERSTAND THAN DIFFICULT TO UNDERSTAND THAN MAIN LINE REQUIREMENTSMAIN LINE REQUIREMENTSEXCEPTIONS OCCUR INFREQUENTLY EXCEPTIONS OCCUR INFREQUENTLY BUT REQUIRE DISPROPORTIONATE BUT REQUIRE DISPROPORTIONATE EFFORTEFFORT

Page 11: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

SOURCES OF EXCEPTIONSSOURCES OF EXCEPTIONSOPERATIONAL REQUIREMENTS

LOSS OF POWER, COMMUNICATION, THERMAL CONTROL

IMPLEMENTATION DETAILCALIBRATION ANOMALIES, ACTUATOR STATES, OPERATOR INPUT

COMPUTING ENVIRONMENTHARDWARE FAILURES, MEMORY ERRORS, EXECUTIVE, MIDDLEWARE

MONITORING AND SELF-TESTOVER-TEMPERATURE SENSORS, SYSTEM PERFORMANCE TEST

APPLICATION SOFTWAREASSERTIONS, VIOLATION OF TIMING CONSTRAINTS, MODE CHANGES

Page 12: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

WHO IS RESPONSIBLE?WHO IS RESPONSIBLE?

OPERATIONAL REQUIREMENTS

IMPLEMENTATION DETAILS

COMPUTING ENVIRONMENT

MONITORING AND SELF-TEST

APPLICATION SOFTWARE

SYSTEM

ENGINEERING

EQUIPMEMT

SPECIALIST

VEHICLE

HEALTH MGM’T

SOFTWARESOFTWARE

ENGINEERINGENGINEERING

Page 13: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

REQUIREMENT GENERATIONREQUIREMENT GENERATION

OBJECTIVE OBJECTIVE EXCEPTION CONDITION AND ACTIONEXCEPTION CONDITION AND ACTION

ALGORITHMALGORITHMQUANTITATIVE CONDITION DESCRIPTIONQUANTITATIVE CONDITION DESCRIPTIONTIMING AND RESPONSIBILITY FOR TIMING AND RESPONSIBILITY FOR ACTIONACTION

ASSIGNMENTASSIGNMENTSPECIFY SOFTWARE IMPLEMENTATION SPECIFY SOFTWARE IMPLEMENTATION OF ALGORITHMOF ALGORITHM

Page 14: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

DOES IT ADD UP?DOES IT ADD UP?

CONCEPT SYST. REQ'MTS SOFTW.REQ'MTS SOFTW.DESIGN CODING

OBJECTIVE ALGORITHM ASSIGNM'T

OBJECTIVE ALGORITHM ASSIGNM'T

OBJECTIVE ALGORITHM ASSIGNM'T

OBJECTIVE ALGORITHM ASSIGNM'T

OBJECTIVE ALGORITHM ASSIGNM'T

OPERATIONAL REQM'TS

IMPLEMENTATION

COMPUTING ENV.

MONIT. & SELF-TEST

APPLICATION SOFTW.

Page 15: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

BUILDING BLOCKSBUILDING BLOCKS

EXISTING PRACTICESEXISTING PRACTICESEXPERIENCEEXPERIENCETOOLSTOOLS

INTEREST GROUPINTEREST GROUPWORKING GROUPWORKING GROUPRECOMMENDED PRACTICERECOMMENDED PRACTICE

Page 16: SYSTEM ENGINEERING AND SOFTWARE EXCEPTION HANDLING › ... · “Therefore the identification and handling of the exceptional situations that might occur is often just as (un)reliable

CONTACTCONTACT

[email protected]

Herb Hecht

310/338-0990 X110