mike taylor: lessons from history - case studies that might help spot where things can go wrong

61
Lessons from history – Case studies that might help spot where things can go wrong Mike Taylor, Advitech Pty Ltd, Mayfield, Australia

Upload: nsw-resources-energy

Post on 06-Jan-2017

62 views

Category:

Government & Nonprofit


0 download

TRANSCRIPT

Page 1: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Lessons from history – Case studies that might help spot where things can

go wrong

Mike Taylor, Advitech Pty Ltd, Mayfield, Australia

Page 2: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Incident Prevention Strategy, Feb 2016

• Risk-based intervention - develop a framework for the ongoing identification and verification of risk profiling, incorporating risk control measure verification, and consideration of deployment practices to target areas of risk priority.

• Human and organisational factors - research and consider the impact of human and organisational factors on risk management and reporting.

Page 3: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

• G Hill cartoon

Page 4: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

A few clues on where risk control measures

may be weak or missing altogether

• “We’ll risk assess that out” • “Everybody knows” assumptions • Specification errors • Management systems • Unclear responsibilities • Human error

Page 5: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Other warning signs

• Too much emphasis on the risk assessment process, rather than the outcomes

• Some methods good for establishing priorities, but not much else

• Reliance placed on barriers and controls • Controls may not be as effective as first thought • Control weaknesses may lie dormant for years

Page 6: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

A commonly-used method

Page 7: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

What about barriers and controls?

• Essential to list them • Essential to judge their effectiveness • Be wary of re-evaluating risk until proposed

barriers and controls are in place and found to be effective

• Sometimes the existing controls are the ones that are the weakest

Page 8: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Faults and failures

• Failure: Function not performed

• Fault: Loss of capability to perform the function when called upon to do so

• Dangerous undetected faults: May lie dormant for years before failure actually occurs

• Initial fault may be random or non-random

Page 9: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Random hardware failures

• Corrosion, wear, seizure, loosening, etc • Predictable as to their rate, but not as to when

the next failure will occur • Often detected and repaired before any damage

caused

• Various sources of information available (histories)

• Conventional statistical analysis and modeling

Page 10: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Engineers comforted by predictability and numbers

• Calculating probability of failure on demand, based on a uniform failure rate λ :

PFDG = 2 [(1- βD) λDD + (1- β) λDD]2 tCE tGE

+ βD λDD MTTR + β λDU ( T1/2 + MRT)

• Perhaps even seduced by the numbers?

Page 11: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Non-random failures

• So-called “systematic failures” • Not related to normal degradation mechanisms

of corrosion, wear, etc • Deterministic rather than probabalistic • Often more difficult to detect and eliminate • Actual failure may be the first indication of

trouble

Page 12: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

What can be learned from history of non-random faults and failures?

• Quantitative information (component life, failure modes, etc) generally not applicable

• Fewer obvious examples, unlike failures of hardware components

• Not amenable to statistical analysis or modeling

• Subtle, underlying causes, often overlooked in post-incident investigations

Page 13: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Why might systematic (non-random) failures receive less attention?

• People may assume that existing management systems and processes are able to deal with them

• Examples: – Design reviews – Approvals processes – Issues tracking – Management of change – Check / back-check systems

Page 14: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Case studies

• Barriers and controls found to be less effective than initially assumed

• Non-random failures. Events not equally likely. • Underlying faults or weaknesses that can remain

undetected for long periods

Page 15: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Clapham Junction, London, 1988

• Three trains collided • 35 people killed • Signal was green when it should have been red

• A wiring fault, after modification work • Immediate fault was dormant for about eight

hours • Underlying fault dormant for years

Page 16: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

• (pic site)

Source: Hidden A, 1989, Investigation into the Clapham Junction Railway Accident, Department of Transport, London

Page 17: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

• (pic site)

Source: Hidden A, 1989

Page 18: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Milton Keynes, North London, 2008

• Signal was green when it should have been red

• Fault was noticed before a collision could occur • A software specification error, as part of

modification work • Fault was dormant for months

Page 19: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Non-random failures

• Random hardware failures – Corrosion – Wear – Fatigue – etc

• Predictable as to their rate, but not as to when the next Source: RAIB, 2010 Special Investigation – Review of the railway industry’s investigation of an irregular signal sequence at Milton Keynes, 29 December 2008, Department of Transport

Page 20: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Falkirk, Scotland, 2009

• Points were set in the wrong position for the train to pass safely

• Train at 100 km/hour, fortunately did not derail • A wiring fault, after modification work • Fault was dormant for a few hours • Underlying fault dormant for years

Page 21: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong
Page 22: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Case study: Falkirk, Scotland, 2009

• Points were set in the wrong position for the train to pass safely

• Train at 80 km/ hour fortunately did not derail • A wiring fault, after modification work • Proper testing not carried out after the work

Source: RAIB, 2010 Rail Accident Report Incident at Greenhill Upper Junction, near Falkirk 22 March 2009, Department of Transport Report 04/2010

Page 23: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Non-random failures

• Random hardware failures – Corrosion – Wear – Fatigue – etc

• Predictable as to their rate, but not as to when the next one will occur

Source: RAIB, 2010 Rail Accident Report Incident Report 04/2010

Page 24: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Falkirk, Scotland

• Wire count not performed in the field • Field workers assumed wire count done in the

workshop

Page 25: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Cootamundra, NSW, 2009

• Signal was green when it should have been red

• Fault was noticed before a collision could occur • An error during the design was not properly

tracked • Fault was dormant for two years

Page 26: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: ATSB TRANSPORT SAFETY REPORT Rail Occurrence Investigation RO-2009-009 , Reported signal irregularity at Cootamundra NSW involving trains ST22 and 4MB7 , 12 November 2009

Page 27: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Minneapolis, MN, 2007

• Steel bridge collapsed • 13 persons killed • Design fault, carried through to construction

• Fault was dormant for 40 years

Page 28: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: National Transportation Safety Board, Accident report NTSB/HAR-08/03 PB2008-916203, Collapse of I-35W Highway Bridge Minneapolis, Minnesota , August 1, 2007.

Page 29: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: Accident Report NTSB/HAR-08/03 PB2008-916203

Page 30: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: Accident Report NTSB/HAR-08/03 PB2008-916203

Page 31: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: Accident Report NTSB/HAR-08/03 PB2008-916203

Page 32: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: Accident Report NTSB/HAR-08/03 PB2008-916203

Page 33: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

USAir, Aliquippa, PA, 1994

• Aircraft crashed during landing approach, with all on board lost

• Control system failure • Original failure modes analysis anticipated such a

failure • Analysis did not properly anticipate the effects • Fault was dormant for 25 years • Fault not revealed until two other aircraft

incidents

Page 34: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: Aircraft Accident Report – Uncontrolled Descent and Collision with Terrain US Air Flight 427, Boeing 737-300, N513AU, Near Alquippa, Pennsylvania, September 8 1994 National Transportation Safety Board PB 99-910401

Page 35: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: National Transportation Safety Board PB 99-910401

Page 36: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Alaska Airlines,

Anacapa Island, CA, 2000

• Aircraft crashed soon after take-off. All on board lost.

• Mechanical failure of screw thread and nut • Evidence of wear could have been detected, but

was not • Fault was dormant for ten years

Page 37: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: Aircraft Accident Report Loss of Control and Impact with Pacific Ocean Alaska Airlines Flight 261 McDonnell Douglas MD­83, N963AS About 2.7 Miles North of Anacapa Island, California January 31, 2000, National Transportation Safety Board NTSB/AAR­02/01 PB2002-910402

Page 38: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Non-random failures

• Random hardware failures – Corrosion – Wear – Fatigue – etc

• Predictable as to their rate, but not as to when the next one will occur

Source: National Transportation Safety Board NTSB/AAR-02/01 PB2002-910402

Page 39: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: National Transportation Safety Board NTSB/AAR-02/01 PB2002-910402

Page 40: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: National Transportation Safety Board NTSB/AAR-02/01 PB2002-910402

Page 41: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

American Airlines,

Belle Harbor, NY, 2001

• Aircraft crashed shortly after take-off, with all on board lost

• Pilot error • Haptic feedback (“feel”) of rudder pedals

different from many other similar aircraft • Aggressive use of rudder. Vertical stabilizer

overloaded.

Page 42: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: Aircraft Accident Report NTSB/AAR-04/04 , In-Flight Separation of Vertical Stabilizer American Airlines Flight 587 Airbus Industrie A300-605R, N14053 Belle Harbor, New York November 12, 2001, National Transportation Safety Board, PB2004-910404 Notation 7439B

Page 43: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Cape Hillsborough, Qld, Australia, 2003

• Emergency medical services helicopter mission • Aircraft crashed into sea on foggy night, with all

on board lost • Possible loss of spatial orientation • Several key risk factors present • Operators unaware of US study into risk factors • Fault was dormant for ten years

Page 44: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: Aviation Safety Investigation 2003 04282, Bell 407 VH-HT Cape Hillsborough, Qld, 17 October 2003, Australian Transport Safety Bureau

Page 45: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Markham Colliery, UK, 1973

• Brake rod broke (fatigue fracture) • 18 people killed • Poor design: No practicable means of lubrication

• Warning from 1961 incident • Crack probably present when inspected in 1961

Page 46: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: Calder JW , 1974, Accident at Markham Colliery Derbyshire: report on the cause of, and circumstances attending, the overwind, which occurred at Markham Colliery, Derbyshire, on 30 July 1973. Department of Energy

Page 47: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: Calder JW , 1974

Page 48: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: Calder JW , 1974

Page 49: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Qantas, Batam Island, Indonesia, 2010

• A380 engine rotor failure • Significant damage from debris • Caused by broken oil feed pipe, poorly

manufactured • Failure modes analysis did not properly anticipate

the effects • Two faults, each dormant for several years

Page 50: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: ATSB Transport Safety Report Aviation Occurrence Investigation Report AO-210-089, 27 June 2013. In-flight uncontained engine failure Airbus A380, VH0QA, overhead Bantam Island, Indonesia, 4 November 2010

Page 51: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: ATSB Transport Safety Report Aviation Occurrence Investigation Report AO-210-089, 27 June 2013

Page 52: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: ATSB Transport Safety Report Aviation Occurrence Investigation Report AO-210-089, 27 June 2013

Page 53: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: ATSB Transport Safety Report Aviation Occurrence Investigation Report AO-210-089, 27 June 2013

Page 54: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: ATSB Transport Safety Report Aviation Occurrence Investigation Report AO-210-089, 27 June 2013

Page 55: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: ATSB Transport Safety Report Aviation Occurrence Investigation Report AO-210-089, 27 June 2013

Page 56: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Source: ATSB Transport Safety Report Aviation Occurrence Investigation Report AO-210-089, 27 June 2013

Page 57: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Conclusions

• Plenty of new mistakes to be made, without repeating the old ones

• Human error implicated in most of these cases • Human error rates much higher than those for

physical devices • Statistics not much help when dealing with non-

random failures

Page 58: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Conclusions

• Easy to lose sight of the real issues if just focused on process

• Misplaced reliance on barriers and controls, especially existing controls

• Weakness can remain dormant for years

Page 59: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Implications for designers and operators

• Recognise that one systematic fault can undo all the good work with random hardware failurepredictions

• Recognise the places where things can go wrong: – Specification errors – Failure mode assumptions – “Everybody knows” assumptions – Unclear responsibilities

• Look for subtle signs of problems duringoperations

Page 60: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong

Thank you for your attention

Page 61: Mike Taylor: Lessons from history - Case studies that might help spot where things can go wrong