human reliability analysis – challenges in modelling operational risk tim bedford strathclyde...
TRANSCRIPT
Human reliability analysis – challenges in modelling operational risk
Tim BedfordStrathclyde Business SchoolUniversity of Strathclyde
Objectives
Discuss modelling issues surrounding human reliability issues in operational risk
Consider how time dynamics can be incorporated, and the potential benefits and difficulties
Work done on safety relevant to other operational risks
Example – Lambrigg Derailment
February 2007, Virgin train derails between Preston and Carlisle
1 Fatality, 22 Hospitalised Primary cause identified as faulty set of
points
Inquiry Findings
Deficiencies in the inspection and maintenance regime resulted in the points falling into disrepair. These deficiencies included: A breakdown in the local management structure
responsible for inspection and maintenance The track patrolling regime’s systematic failure
to inspect the area adequately Quality standards not being communicated or
executed in the proper manner A lack of sample checking of the track to test
inspection quality and arrangements
Inquiry Findings
The patrol scheduled for 18 February 2007 was not done
The QA regime did not identify failures in the reliability of inspection regimes, nor failures in application of best practice.
Emergence of “Them & Us” culture Management structure based on
activity, not location
Inquiry Findings
High proportion of Staff on Temporary promotion
Culture of “Learned Helplessness” Insufficient records on staff training
and competencies Staff unsure of their contracted
responsibilities Lapsed engineering qualifications
Is risk static?
Clearly not Physical systems change through time,
either through degradation or upgrading
Human systems change through time, as a result of operating procedures, staff ability, organisational changes etc
Should we be concerned about dynamically changing risks?
Maybe yes, maybe no…!
No – over time it averages out to the same as the “static” risk, so that cumulative risk is same.
Yes – If different risks change dynamically in a coupled way, then this can magnify the overall effect
Yes – If no intervention then the risk at the end may be lower than acceptable (eg often regulate annual risk)
Yes – If understanding the dynamics helps you create new strategies to reduce risks
Dynamic versus static statistical
PRA models usually assume rates/probabilities not time dependent
t
rWorst case
Achievable
Statistical estimatewith conf bounds
Interacting dynamics of productivity and safety pressures
D. L. Cooke and T. R. Rohleder, Learning from incidents: from normal accidentsto high reliability, Sys Dyn Review
Feedback from incidents
D. L. Cooke and T. R. Rohleder, Learning from incidents: from normal accidentsto high reliability, Sys Dyn Review
Examples:Accident Precursors; CIRAS
Human reliability models
In widespread use as part of Probabilistic Risk Analysis Aim to “give a number” as well as understanding of
source of risk. Largely based on task analysis, breaking down human
behaviour into steps (cognitive, decision, action etc). Performance shaping factors influence probability of
success, and may be common to more than one step First generation methods
Eg THERP, HCR, HEART, JHEDI Second generation methods
Eg ATHENA, CREAM Third generation
Monte Carlo based – linking cognition based models to technical system dynamics
Mon
itor
prim
ary
syste
m
pres
sure
& tem
p.;
Table
(20-
10)1
Start
SP
pum
p 2
{
THERP HRA Tree1
3
4
5
6
7
8
9
10
Start
LH
pum
p 1
Start
LH
pum
p 2
Start
LH
pum
p 3
FT ope
n
PSva
lve 2
FT ope
n
PSva
lve 3
1. Startconfinement spray pumps
2. Start Low pressure pumps
3. Open Pressurizer safety valves
(Depressurization)
{7.
5E-3
7.5E
-3
FT
open
pres
s.
safe
ty
valve
1
StressMod high, skilled, dynamic (heavy task load)THERP Table (20-16)5a = 5
DependencyAction could start as early as 6 minutes, so dependency based on 10 minutesOperator 2 = complete = 1Shift Super. = high = 0.5
Assumed all pumps are required
4. Monitor primary system temperature & pressure
{7.
5E-3
Start
SP pum
p 1;
Selecti
ng w
rong
cont
rol fr
om
func
tiona
l gro
up
Table
(20-
12)3
=1E-3
[1E-3 * 5 (stress) * .5 (dependency)] * 3 branches = 7.5E-3
3E-3
Total HEP[(7.5E-3)*3] +3E-3 =2.55E-2
EF from Table (20-20)7 = 5
Start
SPpu
mp
3
2
THERP Data Summary Table
What drives the main risks?
The standard HRA models, while useful do not appear to capture the main sources of risk Accidents continue, and many (most?) are not
due to random human failures Models do give insight and guidance about risk
reduction including prioritization Qualitative approaches such as normal
accident theory and HRO do not give guidance about prioritization, but may give insights about strategies for risk reduction
Organisational failure: Reason’s Swiss Cheese model
Modelling for understanding, or for optimization?
Models typically one of Formative: inform system, organisation and
process design, guiding management practice Summative: used to support decisions on, e.g.,
adoption, licensing or maintenance, by modelling cost/benefit trades
Qualitative HR modelling tends to be formative.
Quantitative HR modelling should be summative, but if not modelling the most significant system behavior then maybe actually most value in formative sense (risk analysis rather than management)
Summative Modelling
Model building philosophy Models appropriate to purpose Cost-effective Taking account of uncertainties Models for DM should be able to include effects
of intervention. Hard and soft interventions possible
Hard example – employ extra staff member to increase capacity
Soft example – give employees performance feedback
Some dynamic approaches to HR
Holmberg et al (2000) Suggested use of marked point process
David L. Cooke Thomas R. Rohleder (2008) Used systems dynamics
Zahra Mohaghegh, Reza Kazemi, Ali Mosleh (2009) Used hybrid approaches combining SD, PRA
and BBNs Lots of other dynamic risk modelling
approaches, eg petri nets, living psa
Mohaghegh, Kazemi, Mosleh
Common framework
A marked point process requires specification of Possible marks (event types) Relevant history for each mark The likelihood for a mark occuring,
given the history Broadly, all three approaches fit into
this framework, with either SD or BBNs driving the likelihood.
Main difficulties
Complexity – existing models seem very complex… is this necessary for summative purposes?
Measurement scales – for soft interventions these are often vaguely defined and not sufficient to build a robust model
Elicitation – require ways of robustly assessing rates etc for these models
Dependencies – interventions may impact on many different aspects of the system
Model uncertainties – folding these into analysis of options
Possible approaches
Complexity – restrict attention to cost/benefit of “discrete” feedback (major accidents) and “continuous” feedback (eg CIRAS). However, for summative approach also need to account for model uncertainties, which makes more complex again!
Measurement scales – use locally valid subjectively defined scales
Elicitation – assess possible changes in system outcomes and derive parameters implicitly (inversion)
Dependencies – model through impact of intervention on common PSFs (eg workload)
Model uncertainties – simulation
Broad brush effects on HR
+ Safety first culture Clear Quality
standards
- Quality drift Productivity focus Cost cutting
Example discrete feedback
System is designed to have exponential time to failure with MTTF 1000 years
However, due to lack of failures the system management becomes lax, and rate increases. When failure happens, system is reset to design standard. Suppose 1 failure per 30 years.
+
Hazard rate
Failureevent
-
Model for failure rate is +t MTTF is 30=
Solving gives =0.0017
0
22
0
2
/2
)/(exp
2)
2exp(
2
))2
(exp(
dtt
dttt
Local measurement scales exampleSLIM – based on MCDA
Success Likelihood Index Methodology is an early HRA method
Combines Performance Shaping Factor scores using “multiattribute utility” method to quantify Human Error Probability
Key ideas Ideal points on PSF scale, Expert defined scores Pairwise comparison for attribute weights Two point calibration to identify scale length Common PSFs provide dependency across HR
elements
i
iiPSFwbHEPa log
Conclusions
New growth in dynamic human reliability modelling
Approaches more applicable to service operations
Hybrid HR models with feedback loops give the possibility of modelling “soft” interventions
BUT many open problems in implementing robustly
Acknowledgements
Work in EPSRC funded project with Simon French, Jerry Busby, Emma Soane, David Tracy and others