the development of a tool to predict team performance

8
The development of a tool to predict team performance M.A. Sinclair a, * , C.E. Siemieniuch b,1 , R.A. Haslam c, 2 , M.J.d.C. Henshaw b, 1 , L. Evans d, 3 a Centre for Innovative & Collaborative Engineering, Loughborough University, Loughborough LE113TU, United Kingdom b Department of Electrical & Electronic Engineering, Loughborough University, Loughborough LE11 3TU, United Kingdom c Department of Ergonomics, Loughborough University, Loughborough LE11 3TU, United Kingdom d Human Factors Department, BAE Systems Advanced Technology Centre, BS34 7QW, United Kingdom article info Article history: Received 10 June 2010 Accepted 2 May 2011 Keywords: Methodology Performance prediction Small groups abstract The paper describes the development of a tool to predict quantitatively the success of a team when executing a process. The tool was developed for the UK defence industry, though it may be useful in other domains. It is expected to be used by systems engineers in initial stages of systems design, when concepts are still uid, including the structure of the team(s) which are expected to be operators within the system. It enables answers to be calculated for questions such as What happens if I reduce team size?and Can I reduce the qualications necessary to execute this process and still achieve the required level of success?. The tool has undergone verication and validation; it predicts fairly well and shows promise. An unexpected nding is that the tool creates a good a priori argument for signicant attention to Human Factors Integration in systems projects. The simulations show that if a systems project takes full account of human factors integration (selection, training, process design, interaction design, culture, etc.) then the likelihood of team success will be in excess of 0.95. As the project derogates from this state, the likeli- hood of team success will drop as low as 0.05. If the team has good internal communications and good individuals in key roles, the likelihood of success rises towards 0.25. Even with a team comprising the best individuals, p(success) will not be greater than 0.35. It is hoped that these results will be useful for human factors professionals involved in systems design. Ó 2011 Elsevier Ltd and The Ergonomics Society. All rights reserved. 1. Introduction We report the development, verication and validation of a tool to predict the performance of teams when executing a process, in answer to a direct request from engineers and human factors experts in the UK defence industry. In the version discussed in this paper, the tool is entitled Performance Evaluation and Assessment for Teams, version 9.1(PEAT 9.1). The tool provides designers of military systems at the concep- tual stages of design (when variables are still variables and not parameters) some help in risk reduction exercises when consid- ering the stafng of processes. Some sample questions for which the tool could help in providing answers at this early stage are: What is the likelihood that this team will be successful in executing the process under consideration? By how much can the team size be reduced, before the likeli- hood of success becomes unacceptable? By how much can the attributes of the individuals in the team be reduced, before the likelihood of success becomes unacceptable? These questions are phrased in terms of success. This is a signi- cant point; successis dened here as executing the process correctly and attaining all of the goals of the process, with no reworking, no extra resources, no extra time involved. Likelihood of successis expressed as a probability, as usual. Note that the tool predicts the success of the team in executing the process; it does not predict that the process will be successful. The team may launch the missile successfully, but it may still miss the target. Consequently, proba- bilities of team success should be convolved with probabilities of process success to obtain a quantitative prediction of overall success. 1.1. Theoretical basis for the tool All the constructs and ideas for the tool are already in existence. There are a number of reviews which describe the evolution of * Corresponding author. Tel.: þ44 7590 065250; fax: þ44 1509 223940. E-mail addresses: [email protected] (M.A. Sinclair), c.e.siemieniuch@ lboro.ac.uk (C.E. Siemieniuch), [email protected] (R.A. Haslam), laird.evans@ baesystems.com (L. Evans). 1 Tel.: þ44 1509 635230; fax: þ44 1509 635231. 2 Tel.: þ44 1509 223042; fax: þ44 1509 223940. 3 Tel.: þ44 117 302 8000. Contents lists available at ScienceDirect Applied Ergonomics journal homepage: www.elsevier.com/locate/apergo 0003-6870/$ e see front matter Ó 2011 Elsevier Ltd and The Ergonomics Society. All rights reserved. doi:10.1016/j.apergo.2011.05.004 Applied Ergonomics 43 (2012) 176e183

Upload: ma-sinclair

Post on 04-Sep-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The development of a tool to predict team performance

lable at ScienceDirect

Applied Ergonomics 43 (2012) 176e183

Contents lists avai

Applied Ergonomics

journal homepage: www.elsevier .com/locate/apergo

The development of a tool to predict team performance

M.A. Sinclair a,*, C.E. Siemieniuch b,1, R.A. Haslam c,2, M.J.d.C. Henshawb,1, L. Evans d,3

aCentre for Innovative & Collaborative Engineering, Loughborough University, Loughborough LE11 3TU, United KingdombDepartment of Electrical & Electronic Engineering, Loughborough University, Loughborough LE11 3TU, United KingdomcDepartment of Ergonomics, Loughborough University, Loughborough LE11 3TU, United KingdomdHuman Factors Department, BAE Systems Advanced Technology Centre, BS34 7QW, United Kingdom

a r t i c l e i n f o

Article history:Received 10 June 2010Accepted 2 May 2011

Keywords:MethodologyPerformance predictionSmall groups

* Corresponding author. Tel.: þ44 7590 065250; faE-mail addresses: [email protected] (M.A

lboro.ac.uk (C.E. Siemieniuch), [email protected] (L. Evans).

1 Tel.: þ44 1509 635230; fax: þ44 1509 635231.2 Tel.: þ44 1509 223042; fax: þ44 1509 223940.3 Tel.: þ44 117 302 8000.

0003-6870/$ e see front matter � 2011 Elsevier Ltddoi:10.1016/j.apergo.2011.05.004

a b s t r a c t

The paper describes the development of a tool to predict quantitatively the success of a team whenexecuting a process. The tool was developed for the UK defence industry, though it may be useful in otherdomains. It is expected to be used by systems engineers in initial stages of systems design, whenconcepts are still fluid, including the structure of the team(s) which are expected to be operators withinthe system. It enables answers to be calculated for questions such as “What happens if I reduce teamsize?” and “Can I reduce the qualifications necessary to execute this process and still achieve the requiredlevel of success?”.

The tool has undergone verification and validation; it predicts fairly well and shows promise. Anunexpected finding is that the tool creates a good a priori argument for significant attention to HumanFactors Integration in systems projects. The simulations show that if a systems project takes full accountof human factors integration (selection, training, process design, interaction design, culture, etc.) then thelikelihood of team success will be in excess of 0.95. As the project derogates from this state, the likeli-hood of team success will drop as low as 0.05. If the team has good internal communications and goodindividuals in key roles, the likelihood of success rises towards 0.25. Even with a team comprising thebest individuals, p(success) will not be greater than 0.35.

It is hoped that these results will be useful for human factors professionals involved in systems design.� 2011 Elsevier Ltd and The Ergonomics Society. All rights reserved.

1. Introduction

We report the development, verification and validation of a toolto predict the performance of teams when executing a process, inanswer to a direct request from engineers and human factorsexperts in the UK defence industry. In the version discussed in thispaper, the tool is entitled ‘Performance Evaluation and Assessmentfor Teams, version 9.1’ (PEAT 9.1).

The tool provides designers of military systems at the concep-tual stages of design (when variables are still variables and notparameters) some help in risk reduction exercises when consid-ering the staffing of processes. Some sample questions for whichthe tool could help in providing answers at this early stage are:

x: þ44 1509 223940.. Sinclair), c.e.siemieniuch@(R.A. Haslam), laird.evans@

and The Ergonomics Society. All ri

� What is the likelihood that this team will be successful inexecuting the process under consideration?

� By how much can the team size be reduced, before the likeli-hood of success becomes unacceptable?

� By howmuch can the attributes of the individuals in the team bereduced, before the likelihood of success becomes unacceptable?

These questions are phrased in terms of success. This is a signifi-cant point; ‘success’ is definedhere as executing the process correctlyand attaining all of the goals of the process, with no reworking, noextra resources, no extra time involved. ‘Likelihood of success’ isexpressed as a probability, as usual. Note that the tool predicts thesuccess of the team in executing the process; it does not predict thatthe process will be successful. The team may launch the missilesuccessfully, but it may still miss the target. Consequently, proba-bilities of team success should be convolved with probabilities ofprocess success to obtain a quantitative prediction of overall success.

1.1. Theoretical basis for the tool

All the constructs and ideas for the tool are already in existence.There are a number of reviews which describe the evolution of

ghts reserved.

Page 2: The development of a tool to predict team performance

M.A. Sinclair et al. / Applied Ergonomics 43 (2012) 176e183 177

knowledge about teams; good examples of these are (Cummingset al., 1977; Sundstrom et al., 1990; Barrick and Mount, 1991;Leonard and Freedman, 2000; Sundstrom et al., 2000; Devine andPhilips, 2001; Hare, 2003; Kozlowski and Ilgen, 2006; Stewart,2006), There has been steady progress in understanding teamsand teamwork, albeit in different schools of thought, but it isgenerally agreed that coherent, predictive models for practitionersand systems developers are few in number. However, there aremodels in the literature that could be adapted for predictivepurposes; for example Thurstone’s Five Factor Model (Thurstone,1934) and Salas et al.’s Big Five model (Salas et al., 2005).

In both of these cases, and including PEAT, the models makevery limited use of the research discussed in the reviews above.This is illustrated by both (Benn, 2005) and (Shanahan, 2005), whoindependently have produced similar directed graphs combiningmuch of the research discussed in these reviews. Fig. 1 is fromShanahan, and illustrates an important point; it is immediatelyevident that no practical and useful predictive model could becreated from this work as there are toomany unquantified feedback

Fig. 1. Illustration of variables affecting the perf

loops, and the input data requirements to drive the model areprohibitive. Consequently, a simplified model is necessary.

2. Development of the tool

2.1. The general approach

The development process for the tool followed standard engi-neering processes, as opposed to a scientific process. This wasdeliberate; science answers the question, ‘why .’, whereas engi-neering answers the question ‘what solution can we find .’.Consequently, the approach adopted here is a version of the stan-dard systems engineering methods as outlined in Haskins (2010).Within this so-called ‘VEE process’, the stages from ‘Detaileddesign’ to ‘Integration and test’ were iterated to produce incre-mental improvements to the tool, so that the development processalso corresponded to a spiral development model (Boehm, 1988),resulting in PEAT 9.1, i.e. the version which is described in thispaper.

ormance of a team. From Shanahan (2005).

Page 3: The development of a tool to predict team performance

M.A. Sinclair et al. / Applied Ergonomics 43 (2012) 176e183178

Three constraints guided the development of the tool. Thesewere defined in the initial phases of tool development based oninterviews with key stakeholders. Separate interviews wereundertaken with individuals involved in the engineering lifecycle;3 engineers, 2 project managers, 4 human factors experts, and 2senior engineering discipline managers, all within the samedefence organisation but in different business units, ranging fromaerospace to underwater:

A user-defined constraint: If the tool needs more that a two-page manual to explain it, it will not be used. This was a unan-imous view among a target group of engineers who wereinterviewed at the beginning of the project and reflects thepracticalities of engineering life; firstly, there are never enoughengineers in society, and therefore they are usually highlyunwilling to spend significant time on training for an unknowntool that might not produce time, cost or quality benefits.A business process constraint: Systems designers are alreadyfamiliar with, and may be using as a standard procedure, tech-niques such as HEART (Human Error Assessment & ReductionTechnique (Williams, 1986)) and CREAM (Cognitive Reliabilityand Error Analysis Method - (Hollnagel, 1998)) for assessing thereliability of individuals. The tool should incorporate thesetechniques, or enable the incorporation of any other equivalentcompany technique, in order to enhance the ease of acceptanceinto design processes, and to help to address the negativeaspects in the bullet-point above.A design constraint: At the conceptual stages of design, little willbe known about the individuals in the team that will beexpected to execute any new processes, apart from genericattributes. Equally, the process will be undefined e perhaps just

Fig. 2. Stages of PEAT 9.1, illustrating capture of the team and binding to the process, theadjustment of these due to teamworking and teamworker characteristics, and final convolu

a single flow diagram sketched on a sheet of paper. Hence, thetool must make a minimum demand for input data by reducingto a minimum the variables involved.

In summary, the first constraint has been met, in the form ofa two-page user manual, though a longer version of the manual isalso available. The second constraint has been met by creatinga three-stage tool: Stage 1 collects data about the attributes of theindividuals in the team and the intercommunications deemednecessary for execution of the process; Stage 2 collects an analysisof the process’ organisational environment, using either HEART orCREAM, or the organisation’s own in-house technique for thispurpose; and Stage 3 convolves the outputs of Stages 1 and 2, anddelivers a likelihood of success, depending on the binding of theteam to the process. Fig. 2 illustrates the three stages for using thetool.

The third constraint has beenmet by reducing to aminimum thevariables involved, and hence the input data required.

Based on the constraints described above, the model shouldcontain only those variables widely discussed in the literature, andwhich practitioners deem to be critical to team performance.Because of its intended use by engineers and others not necessarilyexperts in psychological matters, only the irreducible number ofvariables should be used. These were:

� Trustworthiness (dependability of an individual to deliverresults, on time, and in full).

� Teamworking skills (how constructive the person is in aidingthe team to its goals).

� Domain knowledge and skills possessed by the individual forthe process under consideration.

calculation of initial Human Error probabilities (HEPs) using standard techniques, thetion to deliver a prediction of success.

Page 4: The development of a tool to predict team performance

M.A. Sinclair et al. / Applied Ergonomics 43 (2012) 176e183 179

� Communication links between team members necessitated bythe process.

� Authority relationships within the team, as determined byrank, role, and/or expertise.

The derivation process for this choice of variables was by a seriesof 5 discussions among a group of academic subject matter experts(seven in total, with over 100 years of experience in working withindustry) allied to the reviews mentioned earlier.

Hence, we have another five-factor model which has someoverlapswith those of (Thurstone,1934) Salas et al.’s Big Fivemodel(Salas et al., 2005). These variables selected for PEAT are those withwhich systems developers and systems operators are familiar on aneveryday basis, thus adding to the acceptability of the model.

2.2. Description of the tool

The tool currently exists as a spreadsheet, with individualworksheets for each of the stages 1e3 described above.

For each of the first three variables listed at the end of thesection above (Trustworthiness, Teamworking skills and Domainknowledge and skills) four-point rating scales are provided, wordedfor non-experts. Each runs from a zero point to ‘expert’. Commu-nications are captured in a matrix as one-way links; a discussionbetween two team members is represented by two links. There isno attempt to characterise the link itself.

Authority within the team is assessed by a five-point scale. Thisruns from ‘instructed, obeys’ through ‘discusses as equals’ (themid-point) to ‘instructs, expects obedience’. The rationale for thesechoices was ease-of-use by non-experts; the verification and vali-dation studies reported later indicate that the format is sufficientfor purpose.

In stage 1, each team member is rated on these variables. Thefirst three ratings for trustworthiness, teamworking and knowledgeare combined to produce a Performance Shaping Factor (PSF),which acts as a multiplier that adjusts each individual’s propensityfor error on the process, used later in the assessment. The PSF isobtained by means of a look-up table. Since the quality of this tableis critical to the predictions of the tool, its construction is outlinedbelow.

The look-up table was constructed by a total of 8 human factorsexperts, all practitioners with long experience of teams, organisa-tions and processes and with professional qualifications in cognatedisciplines and with professional practitioner affiliations. In allthese experts represented some 200 years of professional experi-ence in industry.

For each, this was a time-consuming, difficult task; as one expertexpressed it, “So, you want me to produce a PSF for some individualwith a trust rating of, say, 2, a teamworking rating of 1, anda knowledge rating of 2, for some process I do not know?” Theresponse was “Exactly that, O Guru.” And they did; the results weresmoothed and inserted into the tool. It is to the credit of theseexperts that PEAT can predict satisfactorily as shown by the veri-fications and validations.

The next step, stage 2, captures the organisational environmentaround the process. This corresponds to standard Human ReliabilityAssessment approaches (e.g. (Swain and Guttmann, 1983; Embreyet al., 1984; Kirwan, 1988; Lee et al., 1988; Dougherty, 1990;Kirwan, 1994; Cooper et al., 1996; Hollnagel, 1996; Perrow, 1999;Bot, 2003; Dekker, 2005a,b; Hollnagel, 2005; Kirwan, 2005;Kirwan and Gibson, 2007; Johnson et al., 2009)). Of these, in theUK the techniques known as HEART (Human Error Assessment &Reduction Technique) (Williams, 1986) and CREAM (CognitiveReliability and Error Analysis Method) (Hollnagel, 1998) are well-known and widely used. Therefore, both HEART and CREAM were

incorporated into the tool, though organisations which have theirown methods for assessing human reliability could replace these.

The final stage 3 combines the base probability of error fromHEART or CREAM, the PSF values for each individual (adjusting thebase probability of error according to each individual’s charac-terisics), the authority ratings, and the communications matrix (toaccount for the contributions of other team members to one’s ownperformance) to arrive firstly at what is called ‘interactive proba-bilities of error’ for each individual; in other words, acknowl-edging peer effect. The initial probabilities of error calculated byHEART, CREAM (or other technique) are adjusted by these extravariables. The assumption is that a person’s performance will beinfluenced by the performances, knowledge, and teamworkingcapabilities of those who communicate with that person. Thisinfluence might be feed-forward, feedback, or just the observablequality of their work.

Consider teammember Awho is a poor performer in the process(though perhaps for other processes he/she is good); in otherwords, the PSF for A is greater than 1.0. Assume B, who is highlycompetent, communicates with A during the process. B’s influenceon A is determined by B’s basic error rate, B’s PSF, and B’s authorityrating within the group, compared to A’s authority. This applies toall team members communicating with A, and these values arecombined to produce an overall effect on A.

The change to A’s performance due to this accumulated effect isdetermined by an influence factor, adjustable by the analystdepending on the operational context. Consider a helicopter pilottaking avoiding action on the basis of other crew membersperceiving a missile on its way and providing feed-forward to thepilot. When the pilot goes into missile-avoidance mode, there isstill the requirement to fly the helicopter safely (else the missilewill achieve its desired effect, even if it misses). So, receipt of themissile information has an effect, but only a partial effect.

Using terms appropriate to each domain, a search in thedomains of psychology, social psychology, business, economics, andsports science produced a total of four papers from which an esti-mate of this partial effect could be inferred, of which three (Tzinerand Eden, 1985; Mathieu et al., 2000; Depken and Haglund, 2007)needed the least number of assumptions in order to calculate thispartial effect. The variable for this partial effect is termed the‘Influence factor’ in PEAT. A default value of 50% is provided for this.Fortunately, the verification and validation tests indicate that 50% isacceptable.

Next, the team’s binding to the process is addressed. So far, wehave calculated an ‘Interactive probability of error’ for each teammember (see Fig. 2), that takes account of the communications inthe team and the influence of team members upon each other. Wenow have to combine these to deliver team performance for theprocess.

A number of papers have addressed this issue (VanDeVen et al.,1976; McGrath, 1984; Sundstrom et al., 1990; Tesluk et al., 1997;Devine, 2002). Unfortunately, these focus on internal structuralvariables of the team and the development of theory; nevertheless,they provide various classifications of teams. These classificationswere combined during discussions with other subject matterexperts and through reviews of internal reports. These indicatedthat three classes of team were the most common; these aredescribed below.

The ‘aircrew’ team. Consider the helicopter example, discussedabove. The pilot executes the process of flying; the rest of thecrew act only as advisors, but do not play a part in flying thehelicopter.The ‘Boatcrew’ team. Consider a rowing eight (nine, with thecoxswain). From start to finish, each person has a specific task,

Page 5: The development of a tool to predict team performance

M.A. Sinclair et al. / Applied Ergonomics 43 (2012) 176e183180

and cannot perform anyone else’s task. Consequently, theabsence of any crew member ensures failure.The ‘Omnicompetent’ team. Here, anyone can perform anyoneelse’s task, and may do so in executing the process. A Call Centreis an approximation to this.

Likelihoods of success for each of these teams, together withconfidence limits, are calculated and presented for the choice of theuser. For all other team types, it will be necessary for the user tocombine the individual Interactive probabilities of error using theLaws of Probability.

Note that where teams have supervisory roles within them,these are addressed by appropriate use of the authority ratings.

3. Current state of the tool

PEAT version 9.1 exists as a spreadsheet on a laptop. The tool isconstrained to deal with teams of ten or less members; when thetool is fully validated and the laptop spreadsheet version is con-verted to a web-based application, this constraint will be removed.Verification and validation studies have been carried out, and areoutlined in the section that follows. The tool has as support twouser manuals; a 2-page version, matching one of the originalconstraints, suitable as an aide-memoire for the user, and a longermanual that discusses extra topics such as tailoring the tool for theorganisation in which it will be used. As a final form of support,each worksheet within the spreadsheet includes a brief set ofinsrtructions-for-use. It is expected that the defence organisationinvolved with this work will convert the tool into a centrally-managed, web-based tool, with controlled future development.

4. Verification and validation of the tool

In this sectionwe review the question of whether the tool worksas planned. The evidence so far indicates that it does. A secondquestion is, ‘Is it sensitive enough?’ (i.e. can the tool produceenough probability values between 0.0 and 1.0 to cater for therange of teams likely to be encountered); the indications are that itis. Discriminability is discussed first below. Normally, one woulddescribe the verification (does it do what we say it does? Does itbehave as we might expect?) of the tool first, then its validation(does it predict in the real world?). Because of an unexpectedfinding during verification, it seems better to discuss validationfirst, then verification.

4.1. Discriminability of the tool

For a team of two individuals (the minimum team), usingCREAM to categorise the organisational environment and assessingall possible combinations of input values for these two individuals,1360 values may be obtained between 0.0 and 1.0. For a team of 10,using HEART, the range of values is approximately 57 billion (albeitnot adjusting for duplicate numbers). There appears to be sufficientcapability for discrimination between similar team arrangementsbut which differ in small ways for the tool to be useful for mostdesign purposes.

4.2. Validation of the tool

To date, 18 validation exercises have been carried out, and theseare summarised in Table 1. All are historical cases, with the resultsknown; i.e. they represent validation-by-criterion. However, in allcases but the last three the criterion was subjective, since theprocesses were not repeated. The last three represent repeatedprocesses; hence, a numerical criterion is available.

Because these are historical cases, each one was conducted byinterview with a participant in that case, the participant waschosen usually from the engineers involved in the case. Occasion-ally, when there were no engineers available, another professionalwould be selected e for example, a human factors expert. In allcases, the selected individual had close connection to the outcomeof the case. Interviews were carried out in privacy, and were semi-structured around the following topics: rationale for the interview,description of the case and the process, description of the teaminvolved, entry of data into PEAT, commentary on PEAT results, andcapture of comments on PEAT (especially those relevant to itsimprovement).

A standard non-parametric KolmogoroveSmirnoff test (Siegeland Castellan, 1995) was carried out to discover any significantdeparture from accuracy, shown in Table 2 below (note that a Chi-squared test is not appropriate due to Cochran’s criterion (Cochran,1954)). The data for this were the subjective opinions of the teamexperts, reduced to the two classes, ‘miss’ and ‘correct prediction’.For this test, any comment in Table 1 that indicated that theprediction was not acceptably accurate (tests 7, 10, 11) counts asa ‘Miss’. H0: No difference between predictions and reality; resultp < 0.05, indicating that there is no significant difference betweenthe predictions and reality as reported by the participants, indi-cating that the tool can be considered to be accurate.

Another standard non-parametric, Binomial test (Siegel andCastellan, 1995) was executed on those estimates deemed aboveand below, according to the subjective comments in Table 1 to testfor any bias in the predictions. The data for this were the subjectiveopinions; for this analysis, if the team expert made any commentabout the prediction being high or low, irrespective of whether thevalue was accepted, it was not allocated to the ‘accurate’ cell.Table 3 shows the results of this. H0: No overall bias in thepredictions; result p < 0.05, indicating that there is no significantbias in the predictions compared to reality as reported by theparticipants.

These tests and tables show that the tool can produce reason-able predictions of the performance of real teams executing realprocesses. It is noteworthy that in each of the last three tests (16, 17,18 in Table 1), which were on repetitive, formalised processeswhere a quantitative criterion was available, the criterion wascomfortably within the 95% confidence limits calculated by PEAT.

4.3. Verification of the tool

To date, some 400 verification tests have been carried out,changing at least one team variable in each test, in order to explorethe behaviour of the tool’s algorithms. A ‘perfect’ set of values wasselected for each of the stage 1 and stage 2 rating scales (i.e. valuesfor perfect individuals working in a perfect environment), whilekeeping the communications structure constant, and the likelihoodof success was obtained from the tool. For the next tests, insuccession, a value for one of the scales would be altered and thelikelihood re-examined.

These tests were intended to answer three questions:Is the likelihood of success associated with the particular set of

input values for the variables within the mathematical limits?Is the change in likelihoods consistent with the changes in the

set of input values, and in keeping with theory?Are the likelihoods plausible, given the set of input values?Two particular teams have been investigated in some detail.

Both are teams of four; the difference between them is in thecommunication patterns. In a ‘Linear’ team, the members arearranged linearly, with two-way communication between adjacentmembers (similar to a production line). In a ‘Cocktail’ team thearrangement is that of a star, with a central person communicating

Page 6: The development of a tool to predict team performance

Table 1Aggregated data for validations. The thick line in the table separates the ‘one-off’ tests from thosewhere repeated results were obtained. Probabilities and 95% confidence limitsfor the test, as calculated by PEAT, are given to two significant figures, as is usual for human reliability measures. The last column provides the subject’s actual comment on theaccuracy of PEAT as compared to the subject’s experience of the team. Minimal information is included due to commercial and individual confidentiality concerns.

Test Nature of test PEAT prediction LCL UCL Comment by test subject

1 Instantiation of a Fire Control System inan armoured vehicle

0.76 0.46 0.99 “That looks good.”

2 Development of control system for a UAV 0.59 0.30 0.95 “OK, if a little generous”3 Development of Health Mgmt

capability for system0.87 0.82 0.93 “I’m happy with that

result -perhaps a little bit high”4 Creating an engineering Technical

Demonstrator0.92 0.88 0.96 “Result is OK; perhaps a bit high”

5 Development of communications systemfor naval vessel

0.95 0.91 0.98 “Rings reasonably true”

6 Bid preparation for submission to US DoD 0.45 0.21 0.907 Execution of a Design & Build project in

M.Eng programme0.49 0.46 0.52 “Result is a bit low”

8 Team delivering HFI to manufacturingTechnical Demonstrator

0.92 0.66 0.99 “OK; but doesn’t account for aweak team member. Got it togetherbecause of the efforts of the restof the team”

9 Development of guidelines fordesign approvalsin government dept.

0.90 0.86 0.94 “That’s OK”

10 Software development for NHS 0.60 0.36 0.94 “Estimate is a bit low; would haveexpected about 0.75”

11 Development of a UAV ground station 0.48 0.47 0.50 “A bit low - would have expectedaround 0.7”

12 Management team in University 0.83 0.78 0.88 “Result is OK, but this isn’t a normalteam; more a collection of individualswith related interests tasked witha common goal”

13 Pension Bd of Trustees managingthe pension fund

0.99 0.99 0.99 “That’s about right.”

14 SAS patrols in hostile territory 0.63 0.30 0.99 “Difficult to assess whether value iscorrect. If contact with the enemy,all plans change, therefore ‘failure’.But no contact ¼ failure, too. Butvalue is OK.”

15 Mentoring team for military OutwardBound scheme

0.9 0.86 0.96 “That’s OK; we got it right most ofthe time despite outside influences”

16 Preparing tanks for gulf war 0.89 0.54 0.99 “On average, 14 out of 16 would gostraight through.”

17 Preparing RAF Tornados for Gulf War 0.082 0.019 0.10 “About right e only 3 of 26 wentthrough without rework”

18 High-tech jobbing shop makingmilitary-standardRAM for development studies

0.97 0.97 0.97 “That’s interesting. Expected monthlyperformance for this process isbetween 0.90 and 0.98.”

M.A. Sinclair et al. / Applied Ergonomics 43 (2012) 176e183 181

with all others. The others have partial communications betweenthem; the net effect is that each team member has a differentnumber of communication links. Subsequently, other teams ofdifferent size (down to 2 and up to 10 people) and communicationsstructures (circular teams; ‘dumbbell teams’ e 2 teams joinedtogether, and other combinations.) have been explored for gener-ality if the findings, with no changes to the conclusions discussedbelow.

All the tests have been performed using CREAM to characterisethe organisational environment. These tests have led to thefollowing overall conclusions about the behaviour of PEAT.

Firstly, none of the tests produced likelihoods outside therange, 0.0e1.0. Secondly, no test produced a result outside the

Table 2KolmogoroveSmirnoff test for departures from an ideal distribution. Getting 3predictions wrong represents a proportion of 0.214. According to table E of Siegel(Siegel and Castellan, 1995), this indicates that the Null hypothesis (no departuresfrom Ideal) is not rejected (p > 0.05).

Classes Idealcumulative

Actualcumulative

Proportionaldifference

Miss 0 3 0.167Correct predictions 18 18 0

expectations arising from theory for that test. For reasons of brevity,only the more important conclusions are listed:

Tests 1e9: Effects of changing the organisational environment� Conclusion 1: If the organisational and working environ-ment is very good (i.e. attention paid to all human factorsaspects) team quality hardly matters e p(success) for ‘worst’team ¼ 0.996; for ‘best’ team ¼ 0.999. There was no differ-ence between the linear and cocktail teams for theseconditions.

� Conclusion 2: If the organisational and working environ-ment is very bad, team quality is significant e p(success) for‘worst’ team ¼ 0.00000256; for ‘best’ team ¼ 0.3336. Thus,a poor team in a poor environment is almost certain to fail,

Table 3Binomial test for bias in predictions. Subjective comments from Table 1 were usedfor this. According to Table D of Siegel (Siegel and Castellan,1995), this indicates thatthe Null hypothesis (no bias in predictions) is not rejected (p > 0.05).

Prediction belowuser’s opinion

Prediction aboveuser’s opinion

Predictions deemed exact

3 4 11

Page 7: The development of a tool to predict team performance

M.A. Sinclair et al. / Applied Ergonomics 43 (2012) 176e183182

and keep on failing, despite repeat attempts. A good teamstands a better chance of first-time success (one in threeattempts), but for most practical scenarios this is likely to betoo low to be acceptable. Again, there was no differencebetween the linear and cocktail teams for these conditions.

Tests 9e40 and 123e154: Effect of varying the quality of eachindividual in turn� Conclusion 6: With reference to Conclusion 2 above, in the‘worst’ working environment, with only one ‘poor’performer and the rest ‘good’, p(success) can drop as low as0.082. This is better than when all four are poor performersand shows the effects of ‘good’ neighbours; but note that ina ‘bad’ environment, good neighbours will not give a decentp(success).

� Conclusion 10: An individual’s effect on the team is greaterin poor working environments. Thus, a ‘poor’ performer willhave a bigger effect in dragging down the performance ofneighbouring team members in a ‘poor’ environment thanin a ‘good’ one.

Tests 41e50, 179e188: Effect of varying the organisationalenvironment� Conclusion 12: p(success) is very dependent on the organ-isational environment. There is about a 67% drop in perfor-mance for the ‘best’ team as their working environment goesfrom ‘best’ to ‘worst’. For a poor quality team (all rated atlevel 2), the fall-off in performance is nearly 100%.Tests 51e74: Effect of supervision

� Conclusion 13: If all individuals are high quality, there is nodiscernable effect of the supervisor; ‘no supervisor’ isa possible organisational choice. This applies to both teamstructures, i.e. ‘cocktail’ and ‘linear’ teams.

� Conclusion 14: In poor working conditions, if the supervisoris poor, his/her role has a significant negative effect on teamsuccess. This effect is greater than for a non-supervisory role,indicating that people selected for a leadership role shouldbe at least average performers, and preferably performanceleaders.

� Conclusion 17: The supervisor effect increases with morecommunication links to the rest of the team.

Tests 75e122: Effects of people of different quality� Conclusion 18a: in a good working environment, there isalmost no (long-term) effect of interchanging people amongthe roles (because, for example, each role will be properlysupported).

� Conclusion 20: The effect of a good person in a role dependson the number of links to other roles e the more there are,the better the effect. In other words, if there are central roles,that’s where the good people should be.

Tests 155e178: Effects of varying team attributes� Conclusion 22: Changes to values ‘Trust in the delivery ofcompetence’ are greater than for the other attributes, i.e.‘Teamworking’ and ‘Knowledge’. Ensuring trust within theteam is of great importance.

From inspection of all of these tests, some general findings canbe distilled.

1. Given that the tool has been constructed as a simple technique,with no feedback loops and no ‘if-then’ rules, it is gratifyingthat the behaviour is as expected; trends are consistent, and notest produced a prediction outside the range 0.0e1.0.

2. It is striking that in all sets of tests, a well-designed workingenvironment is the biggest contributor to p(success). The othervariables in the tool become important as the working envi-ronment degenerates, but they cannot make up for it. In a good

working environment, team variables are relatively unimpor-tant; even when the team is of poor quality, p(success) > 0.9.

3. In poor working environments, the most important variable isthe quality of the team members. As long as there are severalhigh quality members in central roles, able to communicatewith the other members, then a level of performance(p(success)z 0.25) can be achieved. However, evenwith a highquality team, it is not possible to lift p(success) above 0.35.

4. A good team is always better than 4 individuals.

The truth of these statements depends on the validation studies,and more are being carried out. But, insofar as these statements aretrue, findings 2 and 3 together have a significant corollary; theyprovide a strong argument for the importance of Human Factors/Ergonomics in engineering projects. One could state that, ‘To theextent that human issues are ignored in systems design, systemfailure becomes guaranteed’. This is the unexpected finding towhich reference was made earlier, and whose truth content restson the validation studies discussed earlier.

Currently, attempts to justify resources being devoted to HumanFactors Integration (HFI) in the design of systems are largely madeby appeals to instances from the past; Hendrick’s article is anexample (Hendrick, 1997). However, the argument above is an abinitio argument, applicable to any team scenario. It is likely thatthose responsible for HFI will be able to derive risk and cost esti-mates that will provide strong arguments for improved funding ofHFI and hence better outcomes for systems projects.

5. Limitations of the tool

These limitations refer to version 9.1 of the tool

� The tool presupposes that the team will be co-operative andmotivated to perform. It is not able to address all examples ofteams that become dysfunctional due to internal conflict, orother reason, unless this aspect can be addressed using one ofthe existing variables (e.g. a low score for trustworthiness).

� The core assumptions in the toolmean that the prediction is forlong-term success.

� The tool does not predict spikes or pits in performance due toshort-term special circumstances.

� While PEAT produces 95% confidence limits, it should be notedthat 3 of the 18 validation tests did not include the criterionvalue. Some improvement to PEAT may be necessary toimprove precision.

� The tool does not deal directly with teams of changingmembership e so-called ‘hotel’ teams. Such teams can beconsidered by suitable segmentation of the process(es)undertaken by the team.

� Teams which are replenished over time could be assessed byhaving a proportion of the members rated as novices.

6. Conclusions

A tool with some power to predict team performance has beenproduced. It is believed to be the only tool able to provide quanti-tative estimates of team performance available for systemsdesigners to use in the early stages of design. That it is usable bynon-experts in human factors is believed to be an asset; if theengineer’s own use of the tool shows that human factors issuesmust be addressed, this is likely to be a convincing argument.

That the tool is in EXCEL spreadsheet form is good, but insuffi-cient; it is necessary that a web-based version is available, witha management process associated with it. Some plans exist for this.

Page 8: The development of a tool to predict team performance

M.A. Sinclair et al. / Applied Ergonomics 43 (2012) 176e183 183

It should be noted that PEAT was produced as part of an Engi-neering Doctorate, funded by the UK Engineering and PhysicalSciences Research Council, with additional industrial sponsorship.The thesis and the tool are available through the Pilkington Library,Loughborough University, LE11 3TU, UK. Because of commercialconfidentiality, some restrictions apply.

Acknowledgements

This work was funded jointly by UK Engineering & PhysicalSciences Research Council and by BAE Systems. The latter havereviewed and approved this paper.

Early work in exploring the outlines of the problem and theoutlines of a solutionwas carried out over several years by studentsat Loughborough University; Jonathan Benn, Nick Read, Bill Fiske,and Isabel Smith. Many others have provided insight and knowl-edge to this exercise; those most directly involved in this are Mr. P.Wilkinson, Dr. T. Hughes, Dr. A. Leggatt, Dr. N. Colford and Dr.. M.Williams of BAE Systems and Dr. B. Kirwan of Eurocontrol Theircontributions are saluted; Isaac Newton expressed my feelingsabout their many contributions well: “If I have seen a little further itis by standing on the shoulders of Giants.” (letter to Robert Hooke,February 5, 1676).

References

Barrick, M.R., Mount, M.K., 1991. The Big Five personality dimensions and jobperformance: a meta-analysis. Personnel Psychology 44 (1), 1e26.

Benn, J., 2005. Soft Metrics: Development and Application of a Framework for theMeasurement of Human and Organisational Factors in Projects. Systems Engi-neering. Loughborough University, Loughborough. Ph.D.

Boehm, B., 1988. A spiral model of software development and enhancement.Computer (May), 61e72.

Bot, P.L., 2003. Methodological Validation of MERMOS by 160 Analyses. In:Proceedings of the International Workshop: Building the New HRA: Errors ofCommission from Research to Application. OECD, Issy-les-Moulineaux.

Cochran, W.G., 1954. Some methods for strengthening the common Chi-squaredtests. Biometrics 10, 417e451.

Cooper, S.E., Ramey-Smith, A.M., et al., 1996. A Technique for Human Error Analysis(ATHEANA). US Nuclear Regulatory Commission, Washington, DC.

Cummings, T.G., Molloy, E.S., et al., 1977. A methodological critique of fifty-eightselected work experiments. Human Relations 30 (8), 675e708.

Dekker, S.W.A., 2005a. Ten Questions about Human Error: A New View of HumanFactors and System Safety. Lawrence Erlbaum Associates, New Jersey.

Dekker, S.W.A., 2005b. WhyWe Need New Accident Models. Lund University Schoolof Aviation, Ljungbyhed, Sweden.

Depken, C.A., Haglund, L., 2007. Peer Effects in Team Sports: Empirical Evidencefrom NCAA Relay Teams. North American Association of Sports Economists,Charlotte, NC, USA. Working Paper Series: Paper No. 07e29.

Devine, D.J., 2002. A review and integration of classification systems relevant toteams in organizations. Group Dynamics: Theory, Research, and Practice. 6 (4),291e310.

Devine, D.J., Philips, J.L., 2001. Do smarter teams do better e a meta-analysis ofcognitive ability and team performance. Small Group Research 32 (5), 507e532.

Dougherty, E.M., 1990. Human reliability analysis: need, status, trends, and limi-tations. Reliability Engineering and System Safety 29, 283e299.

Embrey, D.E., Humphreys, F., et al., 1984. SLIM-MAUD: An Approach to AssessingHuman Error Probabilities Using Structured Expert Judgement. Nuclear Regu-latory Commission.

Hare, A.P., 2003. Roles, relationships, and groups in orghanisations: some conclu-sions and recommendations. Small Group Research 34 (2), 123e154.

Haskins, C., 2010. Systems Engineering Handbook - A Guide for System LifecycleProcesses and Activities. USA, International Council on Systems Engineering,San Diego.

Hendrick, H., 1997. Good ergonomics is good economics. Ergonomics in Desig-n(April): special insert.

Hollnagel, E., 1996. Reliability analysis and operator modelling. Reliability Engi-neering and System Safety 52, 327e337.

Hollnagel, E., 1998. Cognitive Reliability and Error Analysis Method (CREAM).Elsevier Science, Den Haag.

Hollnagel, E., 2005. Human reliability assessment in context. Nuclear Engineeringand Technology 37 (2), 159e166. VOL.37 NO.2, APRIL 2005.

Johnson, C., Kirwan, B., et al., 2009. The interaction between safety culture anddegraded modes: a survey of national infrastructures for air traffic manage-ment. Risk Management 11 (3e4), 241e284.

Kirwan, B., 1988. A comparative evaluation of five human reliability assessmenttechniques. In: Sayers, B.A. (Ed.), Human Factors and Decision Making. Elsevier,London, pp. 87e109.

Kirwan, B., 1994. A Guide to Practical Human Reliability Assessment. Taylor &Francis, London.

Kirwan, B., 2005. Human reliability assessment. In: Wilson, J.R., Corlett, E.N. (Eds.),Evaluation of Human Work. Taylor & Francis, London, pp. 833e877.

Kirwan, B., Gibson, H., 2007. CARA: a human reliability assessment tool for air trafficsafety management e technical basis and preliminary architecture. In: .Redmill, F., . Anderson, T. (Eds.), The Safety of Systems: Proceedings of the XVSafety-critical Systems Symposium. Springer, London, pp. 197e214.

Kozlowski, S.W.J., Ilgen, D.R., 2006. Enhancing the effectiveness of work groups andteams. Psychological Science in the Public Interest 7 (3), 77e124.

Lee, K.W., Tillman, F.A., et al., 1988. A literature survey of the human reliability compo-nent in a manemachine system. IEEE Transactions on Reliability 37 (1), 24e34.

Leonard, H.S., Freedman, A.M., 2000. From scientific management through fun andgames to high-performing teams: a historical perspective on consulting toteam-based organizations. Consulting Psychology Journal: Practice andResearch 52(1) (1), 3e19.

Mathieu, J.E., Heffner, T.S., et al., 2000. The influence of shared mental models onteam process and performance. Journal of Applied Psychology 85 (2), 273.

McGrath, J.E., 1984. Groups: Interaction and Performance. Prentice Hall, EnglewoodCliffs, NJ.

Perrow, C., 1999. Normal Accidents e Living with High-risk Technologies. princetonUniversity Press, Princeton, NJ.

Salas, E., Sims, D.E., et al., 2005. Is there a “Big Five” in teamwork? Small GroupResearch 36 (5), 555e599.

Shanahan, P., 2005. In: Essens, P., Vogelaar, A., Mylle, J., et al. (Eds.), The ShanahanModel. Military Command Team Effectiveness: Model and Instrument forAssessment and Improvement; NATO RTO TR-HFM-087. NATO Research &Technology Organisation, Neuilly-sur-Seine, France AC/323(HFM-087)TP/59:4-9:4-11.

Siegel, S., Castellan, N.J., 1995. Nonparametric Statistics for the Behavioural Sciences.McGraw-Hill, London.

Stewart, G.L., 2006. A meta-analytic review of relationships between team designfeatures and team performance. Journal of Management 32, 29e55.

Sundstrom, E., DeMeuse, K.P., et al., 1990. Work teams - applications and effec-tiveness. American Psychologist 45 (2), 120e133.

Sundstrom, E., Mclntyre, M., et al., 2000. Work groups: from the Hawthorne studiesto work teams of the 1990s and beyond. Group Dynamics: Theory, Research andPractice 4 (1), 44e67.

Swain, A.D., Guttmann, H.E., 1983. A Handbook of Human Reliability Analysis withEmphasis on Nuclear Power Plant Applications. United States Nuclear ReglatoryCommission, Washington DC.

Tesluk, P., Mathieu, J.E., et al., 1997. Task and aggregation issues in the analysis andassessment of team performance. In: Brannick, M.T., Salas, E., Prince, C. (Eds.),Team Performance Assessment and Measurement. Lawrence Erlbaum Associ-ates., New Jersey, pp. 197e224.

Thurstone, L.L., 1934. The vectors of mind. Psychological Review 41, 1e32.Tziner, A., Eden, D., 1985. Effects of crew composition on crew performance: does

the whole equal the sum of the parts? Journal of Applied Psychology 70, 85e93.VanDeVen, A.H., Delbecq, A.L., et al., 1976. Determinants of coordination modes

within organizations. American Sociological Review 41 (2), 322e338.Williams, J.C., 1986. HEART e a proposed method for assessing and reducing human

error. In: 9th Advances in Reliability Technology Symposium. University ofBradford, UK.