direct cause vs root cause a problem solving concept incose enchantment chapter meeting march...

22
Direct Cause vs Root Cause “A Problem Solving Concept” INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department 12341, Weapon System and Software Quality

Upload: jennifer-glass

Post on 27-Mar-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

Direct Cause vs Root Cause

“A Problem Solving Concept”

INCOSE Enchantment Chapter Meeting

March 14,2007

Dr David E. Peercy

Sandia National LaboratoriesDepartment 12341, Weapon System and Software Quality

Page 2: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 2

Presentation Objective

Events have many potential “causes”. We tend to think of “causes” as related mostly to “unwanted” events – but in effect, all events that occur have “causes” – that is, the reason that the event occurs.

The objective of this short presentation/discussion is to gain a better understanding of why it is important to understand the difference between “direct” causes and “root” causes of events.

In so doing, we enhance our capability to influence a much larger class of events – both in preventing unwanted events and ensuring wanted events actually do occur.

Page 3: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 3

An Example of a Problem

USAF F-22A jets grounded by software glitch<Jeremy Epstein <[email protected]>> Fri, 23 Feb 2007 15:55:52 -0500

Navigational systems failed, planes forced to return to Hawaii [visually having to follow their tankers to safety].

The problem turns out to be software (no surprise there). Fix created, "verified", installed, and they're off again.

[Direct or Root Cause addressed?]

A spokesman for Lockheed Martin this week insisted that the navigation software problem was minor. 'The issue was quickly identified in a matter of days and a fix installed in the airplanes, which were flown successfully to Japan,' he said. 'There are 87 of these exceptional fighters and they are out there performing exceptionally well, and their pilots continue to fly them in new and greater ways.'"

Page 4: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 4

Examples to Test Our Understanding

Army Training Accident, June 2002

Friendly Fire Deaths, March 2002

Medical “Direct/Root” Cause Determinations

RESOURCE: http://catless.ncl.ac.uk/RisksPeter Neumann, Stanford University ProfessorRISK site provides a voluminous list of risks, many of which are computer/software related - primarily interested in security and safety risks; summaries are provided with links to more detail.

Page 5: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 5

A Simple Example

Assume each of these factors is as described below:e: car will not start d: battery is dead c: alternator does not function b: alternator is well beyond its designed service life a: car is not being maintained according to recommended service schedule

Direct Cause?

Intermediary Causes?

Root Cause?

Page 6: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 6

Error, Fault/Defect, Failure

Error– a human action or lack of action that results in the inclusion of a fault in a

product or the way it is used

– the variance between expected and actual results

Fault/Defect– an accidental condition that causes a product to fail to perform its required

function if encountered during operational use

Failure– an event in which a product does not perform a required function within its

specified limits during operational use

ERROR FAULT/DEFECT FAILUREmay lead to may lead to

may lead to NO FAILUREREDUCED EFFECT

FAULT TOLERANCEor

Page 7: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 7

Direct Cause

Causes of events may be natural or man-made, active or passive, initiating or permitting, obvious or hidden.

Those causes that lead immediately to the effect are often called direct or proximate causes.

Examples of direct/proximate causes: Equipment HumanArched • Pushed incorrect buttonLeaked • FellOver-loaded • Dropped toolOver-heated • Connected wires

Page 8: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 8

Root Cause

Direct causes often result from another set of causes, which could be called intermediate causes, and these may be the result of still other causes.

When a chain of cause and effect is followed from a known end-state, back to an origin or starting point, root causes are found.

The process used to find root causes is called root cause analysis --- systematic problem solving.

A root cause is an initiating cause of a causal chain which leads to an outcome or effect of interest.

Page 9: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 9

The Benefits of Problem Solving!

The usual purpose of attempting to find root causes is to solve a problem that has actually occurred, or to prevent a less serious problem from escalating to an unacceptable level (e.g., Near miss safety for aircraft).

The basic concept is that solving a problem by addressing root causes is ultimately more effective than merely addressing symptoms or direct causes.

That is, a “class” of problems may be solved/prevented by addressing root causes rather than just direct causes.

Page 10: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 10

Basic Process - Continue to Ask Why!

Continue to ask “why” until you have reached:

1. Direct, Intermediate, and Root cause(s) - including all organizational factors that exert control over the design, fabrication, development, maintenance, operation, and disposal of the system.

2. A problem/cause that is not correctable by your organization => may be promoted to higher responsible organization.

3. Insufficient data to continue.

Page 11: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 11

Example

Page 12: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 12

Why-Causal Tree

ROOT CAUSES

PROXIMATE CAUSES

INTERMEDIATE CAUSES

Event #2Event #2 Failed or Exceeded Barrier or Control

Failed or Exceeded Barrier or Control

Undesired OutcomeUndesired Outcome

ConditionConditionEvent #1Event #1

WHY Event #1 Occurred

WHY Event #1 Occurred

WHYFailed/Exceeded

Barrier or Control

WHY Event #2 Occurred

WHY Event #2 Occurred

WHYWHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY

WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY

WHY ConditionExisted or Changed

WHY ConditionExisted or Changed

WHYFailed/Exceeded

Barrier or Control

Page 13: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 13

Example

InstalledImproperly

Beyond ShelfLimit

Battery Dead

Power Supply Failed

Root Cause is Much DeeperKeep Asking Why

Satellite FailedTo Deploy Antenna

Satellite FailedTo Deploy Antenna

Technician Used WrongMethod to Correct

Technician Used WrongMethod to Correct

Lost High Speed Data Stream From Satellite (Mission Failure)

Lost High Speed Data Stream From Satellite (Mission Failure)

PoorLine of Sight

PoorLine of Sight

Thrusters Oriented Space Craft

Thrusters Oriented Space Craft

Page 14: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 14

Potential Problem Analysis Tools

Failure Modes and Effects Analysis (FMEA)– an inductive engineering technique used at the component

level to define, identify, and eliminate known and/or potential failures, problems, and errors from the system, design, process, and/or service before they reach the customer

Fault Tree Analysis (FTA)– FTA is a deductive analytical technique of reliability and

safety analyses and generally is used for complex dynamic systems

Probabilistic Risk Assessment (PRA)– PRA is a systematic, logical, and comprehensive discipline

that uses tools like FMEA, FTA, Event Tree Analysis (ETA), Event Sequence Diagrams (ESD), Master Logic Diagrams (MLD), Reliability Block Diagrams (RBD), and so forth to quantify risk.

Page 15: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 15

Summary

Direct Cause vs Root Cause– Issue: level of problem solving

Problem Solving– Direct Cause: objective is to solve an instance of a

potential class of problems– Root Cause: objective is to solve a class of problems– Both are useful

Analysis Methods– Methods exist to analyze events – goal is to eliminate

occurrence of unwanted events and ensure wanted events do occur

– FMEA, FTA, PRAQ&A?

Page 16: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

Examples

Page 17: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 17

Army Training Accident

Incident– Thu, 13 Jun 2002: two soldiers were killed in training at Ft

Drum. They were firing artillery shells, and were relying on the output of the Advanced Field Artillery Tactical Data System. When they forgot to enter the target altitude, the system assumed an altitude of zero. (Ft Drum is 676 ft)

Direct Cause– Soldiers forgot to enter the target altitude

Potential Root Cause(s)– Software should not default to a valid altitude– Software/System analysis and modeling/testing inadequate– Software requirements not adequately specified– System CONOPS not adequate– Soldier training inadequate

Page 18: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 18

Friendly Fire Deaths

Incident– A U.S. Special Forces air controller was calling in GPS positioning from some

sort of battery-powered device. He had used the GPS receiver to calculate the latitude and longitude of the Taliban position in minutes and seconds for an airstrike by a Navy F/A-18. The bomber crew "required" a seconds calculation in degree decimals. The crew did not have equipment to perform the minutes-seconds conversion themselves.

– The air controller had recorded the correct value in the GPS receiver when the battery died. Upon replacing the battery, he called in the degree-decimal position the unit was showing -- without realizing that the unit is set up to reset to its *own* position when the battery is replaced.

– The 2,000-pound bomb landed on the air controller position, killing three Special Forces soldiers and injuring 20 others.

Direct Cause– Taliban position was incorrectly transmitted to the Navy F/A-18 bomber crew

Potential Root Cause(s)– GPS System Default was a valid not invalid position– Lack of battery backup to hold values in memory during battery replacement– Not equipping users to translate one coordinate system to another (reminiscent

of the Mars Climate Orbiter slamming into the planet when ground crews confused English with metric)

– Using a device with such flaws in a combat situation without adequate testing

Page 19: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 19

Medical Direct/Root CauseExample 1 - Questions?

Sentinel event Direct cause Root cause - thoughts?

A patient was given the wrong medication and the patient experienced an adverse reaction. As a result, the patient's length of stay was extended for an additional 10 days.

The nurse who administered the medication did not compare the name on the patient's armband to the name on the medication order. The nurse did not follow the patient identification policy.

Registration staff placed the wrong armband on the patient's arm to begin with.

Page 20: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 20

Medical Direct/Root CauseExample 2 - Questions?

Sentinel event Direct cause Root cause - thoughts?

Doctor prescribes an anti-seizure drug (phenytoin) and the patient develops a severe allergic reaction known as anaphylaxis. The symptoms were itching, hives, swelling in the throat, wheezing, light-headedness from low blood pressure, nausea, and

abdominal cramping.

Patient is allergic to phenytoin.

The doctor did not do a thorough background check on the patient medical history or the patient did not inform the doctor of his/her previous medical history.

Page 21: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 21

Medical Direct/Root CauseExample 3 - Questions?

Sentinel event Direct cause Root cause - thoughts?

Medication of Lasix drip hung to wrong patient. Patient had same last name.

Interruption during medication administration. - nurse had very heavy patient assignment and skipped double check medication administration with another RN.

Missed the double check process on patient identification and medication administration. All hospital medication should be double checked by two nurses.

Page 22: Direct Cause vs Root Cause A Problem Solving Concept INCOSE Enchantment Chapter Meeting March 14,2007 Dr David E. Peercy Sandia National Laboratories Department

March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 22

Medical Direct/Root CauseExample 4- Questions?

Sentinel event Direct cause Root cause - thoughts?

A patient slips and falls on a slippery floor that has been mopped previously from another patient having an upset stomach.

Janitor was not able to put signs down noting caution before the patient walked down the hall because he was interrupted by a cafeteria worker needing him to clean a spill made.

The sign is not down noting the caution.